Selective oxidation of 5-methylcytosine by tet-family proteins

ABSTRACT

The present invention provides for novel methods for regulating and detecting the cytosine methylation status of DNA. The invention is based upon identification of a novel and surprising catalytic activity for the family of TET proteins, namely TET1, TET2, TET3, and CXXC4. The novel activity is related to the enzymes being capable of converting the cytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosine by hydroxylation.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation application under 35 U.S.C. § 120 ofco-pending U.S. application Ser. No. 16/169,801 filed Oct. 24, 2018,which is a continuation application under 35 U.S.C. § 120 of co-pendingU.S. application Ser. No. 15/341,344 filed Nov. 2, 2016, which is acontinuation application under 35 U.S.C. § 120 of co-pending U.S.application Ser. No. 15/193,796 filed Jun. 27, 2016, which is acontinuation application under 35 U.S.C. § 120 of U.S. application Ser.No. 13/795,739 filed Mar. 12, 2013, now U.S. Pat. No. 9,447,452, issuedSep. 20, 2016, which is a continuation application under 35 U.S.C. § 120of U.S. application Ser. No. 13/120,861 filed on Jun. 7, 2011, now U.S.Pat. No. 9,115,386, issued Aug. 25, 2015, which is a 35 U.S.C. § 371National Phase Entry Application of International Application No.PCT/US2009/058562 filed Sep. 28, 2009, which designates the UnitedStates, and which claims benefit under 35 U.S.C. § 119(e) of U.S.Provisional Patent Application Ser. No. 61/100,503 filed Sep. 26, 2008,U.S. Provisional Patent Application Ser. No. 61/100,995 filed Sep. 29,2008, and U.S. Provisional Patent Application Ser. No. 61/121,844 filedon Dec. 11, 2008, the contents of which are incorporated herein in theirentirety by reference.

GOVERNMENT SUPPORT

This invention was made with Government Support under Grant No: RO1AI44432 and Grant No. KO8 HL089150 awarded by the National Institutes ofHealth (NIH). The Government has certain rights in the invention.

FIELD OF THE INVENTION

The present invention relates to enzymes with novel hydroxylase activityand methods for uses thereof, and methods of labeling and detectingmethylated residues.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted in ASCII format via EFS-Web and is hereby incorporated byreference in its entirety. Said ASCII copy, created on Mar. 24, 2011, isnamed 20110324_Seq_List_TXT_033393_063004_US.TXT and is 147,751 bytes insize.

BACKGROUND OF THE INVENTION

DNA methylation and demethylation play vital roles in various aspects ofmammalian development, as well as in somatic cells duringdifferentiation and aging. Importantly, these processes are known tobecome highly aberrant during tumorigenesis and cancer (A. Bird, GenesDev 16: 6-21 (2002); W. Reik, Nature 447: 425-432 (2007); K.Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128:747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006)).

In mammals, DNA methylation occurs primarily on cytosine in the contextof the dinucleotide CpG. DNA methylation is dynamic during earlyembryogenesis and plays crucial roles in parental imprinting,X-inactivation, and silencing of endogenous retroviruses. Embryonicdevelopment is accompanied by major changes in the methylation status ofindividual genes, whole chromosomes and, at certain times, the entiregenome (A. Bird, Genes Dev 16: 6-21 (2002); W. Reik, Nature 447: 425-432(2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22(2006)). For example, there is active genome-wide demethylation of thepaternal genome shortly after fertilization (W. Mayer, Nature 403:501-502 (2000); J. Oswald, Curr Biol 10: 475-478 (2000)). DNAdemethylation is also an important mechanism by which germ cells arereprogrammed: the development of primordial germ cells (PGC) duringearly embryogenesis involves widespread DNA demethylation mediated by anactive (i.e. replication-independent) mechanism (A. Bird, Genes Dev 16:6-21 (2002); W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger,Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); P.Hajkova, Nature 452: 877-881 (2008); N. Geijsen, Nature 427: 148-154(2004)).

De novo DNA methylation and demethylation mechanisms are also prominentin somatic cells during differentiation and aging. Expression ofdifferentiation-specific genes in somatic cells is often accompanied byprogressive DNA demethylation (W. Reik, Nature 447: 425-432 (2007); K.Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128:747-762 (2007)). Tight regulation of DNA demethylation is a feature ofpluripotent stem cells and progenitor cells in cellular differentiationpathways, which could contribute to the ability of these cells toself-renew, as well as give rise to daughter differentiating cells (W.Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067(2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu RevCell Dev Biol 22: 1-22 (2006); S. Simonsson Nat Cell Biol 6: 984-990(2004); R. Blelloch, Stem Cells 24: 2007-2013 (2006)).

It is believed that two important aspects of stem cell function,pluripotency and self-renewal ability, require proper DNA demethylation,and hence, the ability to manipulate these stem cell functions could beimproved by controlled expression of enzymes in the DNA demethylationpathway. The epigenetic reprogramming of somatic nuclei during somaticcell nuclei transfer (SCNT) may also require proper control of DNAdemethylation pathways (W. Reik, Nature 447: 425-432 (2007); K.Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128:747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006); S.Simonsson (2004); R. Blelloch (2006)). For optimal efficiency of cloningby SCNT, regulated DNA demethylation may be required for nuclearreprogramming in the transferred somatic cell nucleus. Moreover, correctregulation of DNA demethylation could improve the efficiency with whichinduced pluripotent stem cells (iPS cells) are generated from adultfibroblasts or other somatic cells using pluripotency factors (K.Takahashi, Cell 126: 663-676 (2006); K. Takahashi, Cell 131: 861-872(2007); J.Yu, Science 318: 1917-1920 (2007)).

DNA methylation processes are known to be highly aberrant in cancer.Overall, the genomes of cancer cells show a global loss of methylation,but additionally tumor suppressor genes are often silenced throughincreased methylation (L. T. Smith, Trends Genet 23: 449-456 (2007); E.N. Gal-Yam, Annu Rev Med 59: 267-280 (2008); M. Esteller, Nature RevCancer 8: 286-298 (2007); M. Esteller, N Engl J Med 358: 1148-1159(2008)). Thus, oncogenesis is associated with aberrant regulation of theDNA methylation/demethylation pathway. Moreover, the self-renewingpopulation of cancer stem cells can be characterized by high levels ofDNA demethylase activity. Furthermore, in cultured breast cancer cells,gene expression in response to oestrogen has been shown to beaccompanied by waves of apparent DNA demethylation and remethylation notcoupled to replication (R. Métivier, Nature 452: 45-50 (2008); S.Kangaspeska, Nature 452:112-115 (2008)). It is presently unknown whetherthis apparent demethylation is due to full conversion of5-methylcytosine (5mC) to cytosine, or whether it reflects a partialmodification of 5-methylcytosine to a base not recognized bymethyl-binding proteins or antibodies to 5-methylcytosine.

DNA demethylation can proceed by two possible mechanisms—a “passive”replication-dependent demethylation, or a process of activedemethylation for which the molecular basis is still unknown. Thepassive demethylation mechanism is fairly well understood and istypically observed during cell differentiation, where it accompanies theincreased expression of lineage-specific genes (D. U. Lee, Immunity, 16:649-660 (2002)). Ordinarily, hemimethylated CpG's are generated duringcell division as a result of replication of symmetrically-methylatedDNA. These hemimethylated CpGs are recognized by the DNAmethyltransferase (Dnmt) 1, which then transfers a methyl group to theopposing unmethylated cytosine to restore the symmetrical pattern of DNAmethylation (H. Leonhardt, Cell 71: 865-873 (1992); L. S. Chuang,Science 277: 1996-2000 (1997)). If Dnmt1 activity or localization isinhibited, remethylation of the CpG on the opposite strand does notoccur and only one of the two daughter strands retains cytosinemethylation.

In contrast, enzymes with the ability to demethylate DNA by an activemechanism have not been identified as molecular entities. There isevidence that active DNA demethylation occurs in certaincarefully-controlled circumstances, such as shortly after fertilization,and during early development of primordial germ cells (PGC) (W. Reik,Nature 447: 425-432 (2007); K. Hochedlinger, Nature 441: 1061-1067(2006); M. A. Surani Cell 128: 747-762 (2007); J. B. Gurdon, Annu RevCell Dev Biol 22: 1-22 (2006); P. Hajkova, Nature 452: 877-881 (2008);N. Geijsen, Nature 427: 148-154 (2004)). The mechanism of activedemethylation is not known, though various disparate mechanisms havebeen postulated (reviewed in (H. Cedar, Nature 397: 568-569 (1999); S.K. Ooi, Cell 133:1145-1148 (2008)). However, no proteins with thesepostulated activities have been reliably identified to date.

Overall, identification of molecules that play a role in activedemethylation and methods to screen for changes in the methylationstatus of DNA would be important for the development of noveltherapeutic strategies that interfere with or induce demethylation andmonitor changes in the methylation status of cellular DNA.

SUMMARY OF THE INVENTION

The present invention provides for novel methods for regulating anddetecting the cytosine methylation status of DNA. The invention is basedupon identification of a novel and surprising catalytic activity for thefamily of TET proteins, namely TET1, TET2, TET3, and CXXC4. The novelactivity is related to the enzymes being capable of converting thecytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosine byhydroxylation.

The invention provides, in part, novel methods and reagents to promotethe reprogramming of somatic cells into pluripotent cells, for example,by increasing the rate and/or efficiency by which induced pluripotentstem (iPS) cells are generated, and for modulating pluripotency andcellular differentiation status. The inventors have made the surprisingdiscovery that members of the TET family of enzymes are highly expressedin ES cells and iPS cells, and that a gain in pluripotency is associatedwith induction of members of the TET family of enzymes and the presenceof 5-hydroxymethylcytosine, while a loss of pluripotency suppresses TETfamily enzyme expression and results in a loss of5-hydroxymethylcytosine. Thus, the TET family of enzymes provide a novelset of non-transcription factor targets that can be used to modulate andregulate the differentiation status of cells. Accordingly, the inventionprovides novel reagents, such as TET family enzymes, functional TETfamily derivatives, or TET catalytic fragments for the reprogramming ofsomatic cells into pluripotent stem cells. This novel and surprisingactivity of the TET family proteins, and derivatives thereof, could alsoprovide a way of improving the function of stem cells generally—any kindof stem cell, not just iPS cells. Examples include, but are not limitedto, neuronal stem cells used to create dopaminergic neurons administeredto patients with Parkison's or other neurodegenerative diseases etc,muscle stem cells administered to patients with muscular dystrophies,skin stem cells useful for treating burn patients, and pancreastic isletstem cells administered to patients with type I diabetes.

The invention also provides novel methods of diagnosing and treatingindividuals at risk for or having a myeloid cancer, such as amyeloproliferative disorder (MPD), a myelodysplatic syndrome (MDS), anacute myeloid leukemia (AML), a systemic mastocytosis, and a chronicmyelomonocytic leukemia (CMML). The inventors have made the surprisingdiscovery that TET family mutations have significant and profoundeffects on the hydroxymethylation status of DNA in cells, and that suchdefects can be detected using the methods of the invention, such asbisulfate treatment of nucleic acids and antibody-based detection ofcytosine methylene sulfonate.

One aspect of the present invention also provides a method for improvingthe generation of stable human regulatory Foxp3+ T cells, the methodcomprising contacting a human T cell with, or delivering to a human Tcell, an effective 5-methylcytosine to 5-hydroxymethylcytosineconverting amount of at least one catalytically active TET familyenzyme, functional TET family derivative, TET catalytic fragment orcombination thereof. In one embodiment, one uses the entire protein ofTET1, TET2, TET3, and CXXC4, or a nucleic acid molecule encoding suchprotein.

In one embodiment, the method of generating human regulatory Foxp3+ Tcells further comprises contacting the human T cell with a compositioncomprising cytokines, growth-factors, and activating reagents. In oneembodiment, the composition comprising cytokines, growth factors, andactivating reagents comprises TGF-β.

Accordingly, in one aspect, the invention provides a method forimproving the efficiency or rate with which induced pluripotent stem(iPS) cells can be produced from adult somatic cells. In one embodimentof this aspect, the method comprises contacting a somatic cell with, ordelivering to a somatic cell being treated to undergo reprogramming, aneffective amount of at least one catalytically active TET family enzyme,functional TET family derivative, TET catalytic fragment, or combinationthereof, in combination with one or more known pluripotency factors, invitro or in vivo. In one embodiment, one uses the entire catalyticallyactive TET1, TET2, TET3, or CXXC4 protein, or a nucleic acid encodingsuch protein. In one embodiment, only a functional TET1, TET2, TET3, orCXXC4 derivative is used. In one embodiment, only a TET1, TET2, TET3, orCXXC4 catalytic fragment is used.

In one embodiment of the aspect, reprogramming is achieved by deliveryof a combination of one or more nucleic acid sequences encoding Oct-4,Sox2, c-Myc, and Klf4 to a somatic cell. In another embodiment, thenucleic acid sequences of Oct-4, Sox2, c-MYC, and Klf4 are deliveredusing a viral vector, such as an adenoviral vector, a lentiviral vector,or a retroviral vector.

Another object of the invention is to provide a method for improving theefficiency of cloning mammals by nuclear transfer or nucleartransplantation.

Accordingly, in one aspect, the invention provides a method forimproving the efficiency of cloning mammals by nuclear transfer ornuclear transplantation, the method comprising contacting a nucleusisolated from a cell during a typical nuclear transfer protocol with aneffective hydroxylation-inducing amount of a catalytically active TETfamily enzyme, a functional TET family derivative, or a TET catalyticfragment thereof.

The invention is based, in part, upon identification of a novel andsurprising hydroxylase activity for the family of TET proteins, namelyTET1, TET2, TET3, and CXXC4, wherein the hydroxylase activity convertsthe cytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosine.However, because 5-hydroxymethylcytosine is not recognized either by the5-methylcytosine binding protein MeCP2 (V.Valinluck, Nucleic AcidsResearch 32: 4100-4108 (2004)), or specific monoclonal antibodiesdirected against 5-methylcytosine, novel and inventive methods to detect5-hydroxymethylcytosine are required.

Accordingly, one object of the present invention is directed to methodsfor the detection of the 5-hydroxymethylcytosine nucleotide in a sample.

In one aspect of the invention, an assay based on thin-layerchromatography (TLC) is used to detect 5-hydroxymethyl cytosine in asample. In other aspects, the methods described herein generally involvedirect detection of 5-hydroxymethyl cytosine with agents that recognizeand specifically bind to it. These methods can be used singly or incombination to determine the hydroxymethylation status of cellular DNAor sequence information. In one aspect, these methods can be used todetect 5-hydroxymethylcytosine in cell nuclei for the purposes ofimmunohistochemistry. In another aspect, these methods can be used toimmunoprecipitate DNA fragments containing 5-hydroxymethylcytosine fromcrosslinked DNA by chromatin immunopreciptation (ChIP).

Accordingly, in one embodiment of the aspects described herein, anantibody or antigen-binding portion thereof that specifically binds to5-hydroxymethylcytosine is provided. In one embodiment, a hydroxymethylcytosine-specific antibody, or hydroxymethyl cytosine-specific bindingfragment thereof is provided to detect a 5-hydroxymethylcytosinenucleotide. Levels of unmethylated cytosine, methylated cytosine andhydroxymethylcytosine can also be assessed by using proteins that bindCpG, hydroxymethyl-CpG, methyl-CpG, hemi-methylated CpG as probes.Examples of such proteins are known (Ohki et al., EMBO J 1999; 18:6653-6661; Allen et al., EMBO J 2006; 25: 4503-4512; Arita et al.,Nature 2008; doi:10.1038/nature07249; Avvakumov et al., Nature 2008;doi:10.1038/nature07273). In some embodiments of these aspects, it maybe desirable to engineer the antibody or antigen-binding portion thereofto increase its binding affinity or selectivity for the5-hydroxymethylcytosine target site. In one embodiment, an antibody orantigen-fragment thereof that specifically bindscytosine-5-methylsulfonate is used to detect a 5-hydroxymethylcytosinenucleotide in a sample.

In one aspect, the invention also provides methods for screening forsignaling pathways that activate or inhibit TET family enzymes at thetranscriptional, translational, or posttranslational levels.

In one aspect, one or more catalytically active TET family enzymes,functional TET family derivatives, or TET catalytic fragments thereof,or DNA encoding one or more catalytically active TET family enzymes,functional TET family derivatives, or TET catalytic fragments thereof,is used to generate nucleic acids containing hydroxymethylcytosine fromnucleic acids containing 5-methylcytosine, or in an alternativeembodiment other oxidized pyrimidines from appropriate free or nucleicacid precursors.

Yet another object of the present invention provides a kit comprisingmaterials for performing methods according to the aspects of theinvention as described herein.

In one embodiment, the kit comprises one or more catalytically activeTET family enzymes, functional TET family derivatives, or TET catalyticfragments thereof, or DNA encoding one or more catalytically active TETfamily enzymes, functional TET family derivatives, or TET catalyticfragments thereof, to be contacted with or delivered to a cell, orplurality of cells.

In one embodiment, the kit comprises one or more catalytically activeTET family enzymes, functional TET family derivatives, or TET catalyticfragments thereof, and one or more compositions comprising cytokines,growth factors, and activating reagents for the purposes of generatingstable human regulatory T cells. In one preferred embodiment, thecompositions comprising cytokines, growth factor, and activatingreagents, comprises TGF-β. In a preferred embodiment, the kit includespackaging materials and instructions therein to use said kits.

In one embodiment, the kit comprises one or more catalytically activeTET family enzymes, functional TET family derivatives, or TET catalyticfragments, or DNA encoding one or more catalytically active TET familyenzymes, functional TET family derivatives, or TET catalytic fragments,and a combination of the nucleic acid sequences for Oct-4, Sox2, c-MYC,and Klf4, for the purposes of improving the efficiency or rate of thegeneration of induced pluripotent stem cells. In one embodiment, thenucleic acid sequences for Oct-4, Sox2, c-MYC, and Klf4 are delivered ina viral vector, selected from the group consisting of an adenoviralvector, a lentiviral vector, or a retroviral vector. In a furtherembodiment, the kit includes packaging materials and instructionstherein to use said kit.

In one embodiment, the kit comprises one or more catalytically activeTET family enzymes, functional TET family derivatives, or TET catalyticfragments thereof, or DNA encoding one or more catalytically active TETfamily enzymes, functional TET family derivatives, or TET catalyticfragments thereof, to be contacted with or delivered to a cell, orplurality of cells for the purposes of improving the efficiency ofcloning mammals by nuclear transfer. In a further embodiment, the kitincludes packaging materials and instructions therein to use said.

In some embodiments, the kit also comprises reagents suitable for thedetection of the activity of one or more catalytically active TET familyenzymes, functional TET family derivatives, or TET catalytic fragmentsthereof, namely the production of 5-hydroxymethylcytosine from5-methylcytosine. In one embodiment, the kit comprises an antibody orbinding portion thereof or CxxC domain of a TET family protein oranother DNA-binding protein that specifically binds to5-hydroxymethylcytosine. In other embodiments, the kit includespackaging materials and instructions therein to use said kits. In otherembodiments, recombinant TET proteins are provided in a kit to generatenucleic acids containing hydroxymethylcytosine from nucleic acidscontaining 5-methylcytosine or other oxidized pyrimidines fromappropriate free or nucleic acid precursors.

The present invention, in part, relates to novel methods andcompositions that enhance stem cell therapies. One aspect of the presentinvention includes compositions and methods of inducing stem cells todifferentiate into a desired cell type by contacting with or deliveringto, a stem cell one or more catalytically active TET family enzymes,functional TET family derivatives, or TET catalytic fragments thereof,or nucleic acid encoding one or more catalytically active TET familyenzymes, functional TET family derivatives, or TET catalytic fragmentsthereof, or any combination thereof, to increase pluripotency of saidcell being contacted. Such cells, upon contact with or delivery of oneor more catalytically active TET family enzymes, functional TET familyderivatives, or TET catalytic fragments thereof, or DNA encoding one ormore catalytically active TET family enzymes, functional TET familyderivatives, or TET catalytic fragments thereof, or any combinationthereof, can then be utilized for stem cell therapy treatments, whereinsaid contacted cell can undergo further manipulations to differentiateinto a desired cell type for use in treatment of a disorder requiringcell or tissue replacement.

The present invention also provides, in part, improved methods for thetreatment of cancer by the administration of compositions modulatingcatalytically active TET family enzymes, functional TET familyderivatives, or TET catalytic fragments thereof. Also encompassed in themethods of the present invention are methods for screening for theidentification of TET family modulators.

Accordingly, in one aspect, the invention provides a method for treatingan individual with, or at risk for, cancer using a modulator(s) of theactivity of the TET family of proteins. In one embodiment, the methodcomprises selecting a treatment for a patient affected by, or at riskfor developing, cancer by determining the presence or absence ofhypermethylated CpG island promoters of tumor suppressor genes, whereinif hypermethylation of tumor suppressor genes is detected, oneadministers to the individual an effective amount of a tumor suppressoractivity reactivating catalytically active TET family enzyme, afunctional TET family derivative, a TET catalytic fragment therein, oran activating modulator of TET family activity.

In one embodiment of this aspect, the treatment involves theadministration of a TET family inhibiting modulator. In particular, theTET family inhibiting modulator is specific for TET1, TET2, TET3, orCXXC4. In one embodiment of the invention, the cancer being treated is aleukemia. In one embodiment, the leukemia is acute myeloid leukemiacaused by the t(10:11)(q22:q23) Mixed Lineage Leukemia translocation ofTET1.

In one embodiment of the present aspect, and other aspects describedherein, the TET family targeting modulator is a TET family inhibitor. Inone embodiment, the TET targeting treatment is specific for theinhibition of TET1, TET2, TET3, or CXXC4. For example, a small moleculeinhibitor, a competitive inhibitor, an antibody or antigen-bindingfragment thereof, or a nucleic acid that inhibits TET1, TET2, TET3, orCXXC4.

In one embodiment of the present aspect, and other aspects describedherein, the TET family targeting modulator is a TET family activator.Alternatively and preferably, the TET targeting treatment is specificfor the activation of TET1, TET2, TET3, or CXXC4. For example, a smallmolecule activator, an agonist, an antibody or antigen-binding fragmentthereof, or a nucleic acid that activates TET1, TET2, TET3, or CXXC4.

Also encompassed in the methods and assays of the present invention aremethods to screen for the identification of a TET family modulator foruse in anti-cancer therapies. The method comprises a) providing a cellcomprising a TET family enzyme, recombinant TET family enzyme thereof,TET family functional derivative, or TET family fragment thereof; b)contacting said cell with a test molecule; c) comparing the relativelevels of 5-hydroxymethylated cytosine in cells expressing the TETfamily enzyme, recombinant TET family enzyme thereof, TET familyfunctional derivative, or TET family fragment thereof in the presence ofthe test molecule, with the level of 5-hydroxymethylated cytosineexpressed in a control sample in the absence of the test molecule; andd) determining whether or not the test molecule increases or decreasesthe level of 5-hydroxymethylated cytosine, wherein a statisticallysignificant decrease in the level of 5-hydroxymethylated cytosineindicates the molecule is an inhibitor, and a statistically significantincrease in the level of 5-hydroxymethylated cytosine indicates themolecule is an activator.

In another embodiment of this aspect, a method for high-throughputscreening for anti-cancer agents is provided. The method comprisesscreening for and identifying TET family modulators. For example,providing a combinatorial library containing a large number of potentialtherapeutic compounds (potential modulator compounds). Such“combinatorial chemical libraries” are then screened in one or moreassays to identify those library members (particular chemical species orsubclasses) that display a desired characteristic activity (e.g.,inhibition of TET family mediated 5-methylcytosine to5-hydroxymethylcytosine conversion, or activation of TET family mediated5-methylcytosine to 5-hydroxymethylcytosine conversion).

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 depicts the chemical structures for cytosine, 5-methylcytosine,5-hydroxymethylcytosine, and 5-methylenesulfonate.

FIG. 2 depicts the conversion of 5-methylcytosine to5-hydroxymethylcytosine that can be mediated by a catalytically activeTET family enzyme, functional TET family derivative, or TET catalyticfragment.

FIGS. 3A-3B shows the various conversions mediated by enzymes encoded bythe “T even” family of bacteriophages. FIGS. 3A-3B show thatalpha-glucosyltransferases add glucose in the alpha configuration, andbeta-glucosyltransferases add glucose in the beta configuration. FIGS.3A-3B also show that beta-glucosyl-HMC-alpha-glucosyl-transferases addanother glucose molecule in the beta-configuration to glucosylated5-hydroxymethylcytosine.

FIG. 4 depicts a method by which methylcytosine and5-hydroxymethylcytosine can be detected in, and isolated from nucleicacids for use in downstream applications.

FIG. 5 identifies the TET subfamily as having structural featurescharacteristic of enzymes that oxidize 5-methylpyrimidines. FIG. 5 is aschematic diagram of the domain structure of the TET subfamily proteins,which includes the CXXC domain, the “C” or Cys-rich domain, and the2OG-Fe(II) oxygenase domain containing a large, low complexity insert.

FIG. 6 demonstrates that overexpression of catalytically active TETsubfamily proteins leads to decreased staining with a monoclonalantibody directed against 5-methylcytosine. FIG. 6 shows the relationbetween 5-methylcytosine staining and high expression of HA on aper-cell basis using the Cell Profiler program. FIG. 6 depicts that themean intensity of 5-methylcytosine staining decreases in the presence ofcatalytically active full-length TET1 (FL) or the C+D domains of TET1(C+D), but not when the catalytic activity is abrogated (FL mut or C+Dmut). FIG. 6 expresses the 5-methylcytosine staining data of FIG. 6Bnormalized to the levels of the mock transfected sample.

FIGS. 7A-E demonstrate that TET1 expression leads to the generation of anovel nucleotide. FIG. 7 depicts line scans of labeled spots on a TLCplate, obtained using phosphorimaging of the results of assays to detecta novel nucleotide in genomic DNA of cells transfected with variousconstructs. FIG. 7A shows the line scan from mock transfected cells.FIG. 7B shows the line scan from cells transfected with catalyticallyactive full-length TET1 (FL). FIG. 7C shows the line scan from cellstransfected with catalytically inactive TET1 (FL mut). FIG. 7D shows theline scan from cells transfected with TET1 catalytic fragment (C+D).FIG. 7E shows the line scan from cells transfected with mutant TET1catalytic fragment (C+D mut).

FIGS. 8A-8C demonstrate that TET1 expression leads to the generation ofa novel nucleotide. FIG. 8 depicts line scans of labeled spots on a TLCplate, obtained using a phosphorimager, and shows that a novelnucleotide is only observed in DNA from cells transfected with thecatalytically-active (C+D) fragment of TET1, as in FIG. 8B, and not inDNA from cells transfected with empty vector, as in FIG. 8A, or thecatalytically-inactive mutant version of (C+D), as in FIG. 8C.

FIG. 9 identifies the novel nucleotide as 5-hydroxymethylcytosine, bydetermining that the unknown nucleotide is identical to authentic5-hydroxymethylcytosine obtained from T4 phage grown in GalU-deficientE. Coli hosts. FIG. 9 depicts the results of LC/MS/MS runs using massspectroscopy analysis with a collision energy of 15V.

FIG. 10 shows that a recombinant protein comprising the catalytic domain(C+D) of human TET1, expressed in baculovirus expression vector ininsect Sf9 cells, is active in converting 5-methylcytosine to5-hydroxylmethylcytosine in vitro, and depicts the relative activity ofthe recombinant C+D fragment of TET1 in the presence of variouscombinations of Fe2+, ascorbic acid, α-KG and EDTA.

FIG. 11A-11I demonstrates the physiological importance of TET1 in generegulation. FIG. 11A shows that TET1 mRNA is strongly upregulated after8 h of stimulation of mouse dendritic cells (DC) with LPS. FIGS. 11B-11Ishow the changes in Tet1, Tet2 and Tet3 mRNA levels in mouse ES cellsthat have been induced to differentiate by withdrawal of leukemiainhibitory factor (LIF) and addition of retinoic acid, and shows thatTet1,Tet2, and the positive control pluripotency gene Oct4 aredownregulated (FIGS. 11B-11E, and FIGS. 11H-11I), whereas Tet3 isupregulated, during RA-induced differentiation (FIGS. 11F-11G).

FIG. 12A-12F shows the effect of Tet RNAi on ES cell lineage gene markerexpression, using cells treated with Tet1,Tet2 and Tet3 siRNAs. FIG. 12Ashows that Tet siRNA inhibits Tea expression. FIG. 12B shows the effectof siRNA-mediated Tea inhibition on Oct4.

FIG. 12C shows the effect of siRNA-mediated Tea inhibition on Sox2. FIG.12D shows the effect of siRNA-mediated Tet1 inhibition on Nanog. FIG.12E shows the effect of siRNA-mediated Tet1 inhibition on Cdx2. FIG. 12Fshows the effect of siRNA-mediated Tea inhibition on Gata6.

FIG. 13A-C shows the identification of 5-hydromethylcytosine as thecatalytic product of conversion from 5-methylcytosine by TET1 anddetection of 5-hydromethylcytosine in the genome of mouse ES cells. FIG.13A shows a schematic diagram of predicted domain structure of TET1,comprising the CXXC domain [Allen, M. D., et al., Embo J, 2006. 25(19):p. 4503-12], cysteine-rich and double-stranded beta-helix (DSBH)regions. FIG. 13B depicts the TLC data of cells overexpressingfull-length (FL) TET1 or the predicted catalytic domain (CD) thatreveals the appearance of an additional nucleotide species identified bymass spectrometry as 5-hydromethylcytosine. H1671Y, D1673A mutations atthe residues predicted to bind Fe(II) abrogate the ability of TET1 togenerate 5-hydromethylcytosine. FIG. 13C shows that5-hydromethylcytosine is detected in the genome of mouse ES cells.

FIG. 14A-B depicts the role of murine Tea and Tet2 in the catalyticgeneration of 5-hydromethylcytosine in ES cells. FIG. 14A depicts thatthe mouse genome expresses three family members—Tet1, Tet2 and Tet3—thatshare significant sequence homology with the human homologs (Lorsbach,R. B., et al., Leukemia, 2003. 17(3): p. 637-41). Tet1 and Tet3 encodewithin their first conserved coding exon the CXXC domain. FIG. 14B showsthat mouse ES cells express high levels of Tea and Tet2, which can bespecifically depleted with RNAi.

FIG. 15A-D shows the changes in Tet family gene expression that occur inmouse ES cells upon differentiation. FIG. 15A shows that the mRNA levelsof Tea rapidly decline upon LIF withdrawal. FIG. 15B shows that the mRNAlevels of Tet2 rapidly decline upon LIF withdrawal. FIG. 15Cdemonstrates that Tet3 levels remain low upon LIF withdrawal butincrease 10-fold with addition of retinoic acid. FIG. 15D shows that themRNA levels of Oct4 rapidly decline upon LIF withdrawal, as expected.

FIG. 16A-E shows that Tet1, Tet2 and 5-hydromethylcytosine areassociated with pluripotency. FIGS. 16A-16C show the loss ofpluripotency induced by RNAi-mediated depletion of Oct4 potentlysuppresses Tea (FIG. 16A) and Tet2 expression (FIG. 16B) and upregulatesTet3 (FIG. 16C). Sox2 RNAi was found to cause a similar, though weaker,effect as Oct4 RNAi, and Nanog RNAi had almost no effect. FIGS. 16D-16Eshow that the gain of pluripotency in iPS clones derived from mousetail-tip fibroblasts (TTF) by viral transduction of Oct4, Sox2, Klf4 andc-Myc is associated with up-regulation of Tet1 (FIG. 16D) and Tet2 (FIG.16E) and appearance of 5-hydromethylcytosine in the genome.

FIG. 17A-I shows the effect of Tet knockdown on ES cell pluripotency anddifferentiation genes. FIGS. 17A-17C show that RNAi-mediated knockdownof each Tet member does not affect expression of the pluripotencyfactors Oct4 (FIG. 17A), Sox2 (FIG. 17B) and Nanog (FIG. 17C). FIGS.17D-17F demonstrate that RNAi-depletion of Tet1, but not of Tet2 orTet3, increases the expression of the trophectodermal genes Cdx2 (FIG.17D), Eomes (FIG. 17E) and Hand1 (FIG. 17F). FIGS. 17G-17I demonstratethat RNAi-depletion of Tet family members produces small insignificantchanges in expression of extraembryonic endoderm, mesoderm and primitiveectoderm markers Gata6 (FIG. 17G), Brachyury (FIG. 17H), and Fgf5 (FIG.17I).

FIG. 18 shows the theoretical vs. quantified by bisulfite sequencingamount of 5-hydromethylcytosine present in samples in the absence orpresence of various TET family siRNA inhibitors.

FIG. 19 illustrates an assay to detect cytosine methylene sulfonate frombisulfite treated samples.

FIGS. 20A-20B compare the correlation between dot intensity and theamount of cytosine methylene sulfonate (FIG. 20A) or5-hydromethylcytosine (FIG. 20B) present in a sample.

FIGS. 21A-21B show the result of analyses of 5-hydromethylcytosinepresent in samples obtained from patients diagnosed with cancer with orwithout mutations in TET2, by analysis of dot 3 (FIG. 21A) and dot 4(FIG. 21B) from TLC plates.

FIG. 22A-B depicts real-time PCR analyses of various oligonucloetides inthe presence or absence of bisulfite treatment. FIG. 22A shows theamplification plots under the various experimental conditions, and FIG.22B summarizes that data expressed as change in the cycle threshold(Ct).

FIG. 23 depicts the reaction of sodium bisulfite with cytosine,5-methylcytosine, and 5-hydroxymethylcytosine.

FIGS. 24A-24B shows the sequences (SEQ ID NO: 18 and SEQ ID NO: 19,respectively) and primers (SEQ ID NO: 8 and SEQ ID NO: 10, respectively)used to determine whether cytosine methylene sulfonate impedes PCRamplification of DNA.

FIG. 25 shows the results of real-time PCR analysis of variousoligonucleotides before and after bisulfite treatment, expressed as achange in cycle threshold.

FIGS. 26A-26C shows the sequences (SEQ ID NOS 20-22, respectively) andprimers (SEQ ID NOS 11-16, respectively) used to sequence bisulfitetreated genomic DNA from HEK293T cells and the sequences and primersused to sequence the bisulfite treated MLH amplicon. FIG. 26A depictsthe sequence of the no CG amplicon (SEQ ID NO:20); FIG. 26B shows thesequence of the MLH1 amplicon 1 (SEQ ID NO:21), and FIG. 26C (SEQ IDNO:22) shows the sequence of the MLH1 amplicon 2.

FIG. 27A-27C depicts the line traces of bisulfite treated genomic DNA inthe absence or presence of a TET1 catalytic domain. FIG. 27A shows theline traces of MspI sites in the presence or absence of TET1. FIG. 27Bshows the line traces of Tag^(α)I sites in the presence or absence ofTET1. FIG. 27C compares the mean cycle threshold for various ampliconsin the absence or presence of TET1 treatment.

FIG. 28A depicts the generation of abasic sites from5-hydroxymethylcytosine by glycosylases. FIG. 28B shows the specificreaction of abasic sites with aldehyde reactive probes.

FIG. 29A shows the impact of TET1 expression on aldehyde density. FIG.29B compares the impact of co-expression of MD4 on abasic sites andaldehyde density.

FIG. 30 shows the glucosylation of 5-hydroxymethylcytosine byβ-glucosyltransferase.

FIG. 31 shows a schematic diagram depicting how the glucosylation of5-hydroxymethylcytosine can be labeled, using aldehye quantification.

FIG. 32 compares aldehye quantification of DNA under various conditions,including in the presence of sodium bisulfate treatment and sodiumperiodate treatment.

FIG. 33 quantifies the amount of 5-hydromethylcytosine present insamples obtained from patients diagnosed with cancer with or withoutmutations in TET2.

FIG. 34 shows a schematic depicting the sites of various mutations foundin TET2.

FIGS. 35A-B shows the expression of Tet2 in various myeloid and lymphoidlineage populations isolated from bone marrow and thymus. FIG. 35A showsTet2 expression in myeloid lineage subpopulations and FIG. 35B showsTet2 expression in various lymphoid lineage subpopulations.

FIG. 36A-B shows the expression of Tet1 in various myeloid and lymphoidlineage populations isolated from bone marrow and thymus. FIG. 36A showsTet1 expression in myeloid lineage subpopulations and FIG. 36B showsTet1 expression in various lymphoid lineage subpopulations.

FIG. 37A-B shows the reduction of TET2 mRNA and protein expression incells upon treatment with siRNA sequence directed against TET2. FIG. 37Ashows the reduction in mRNA expression, and FIG. 37B shows the reductionin Myc-tagged Tet2 protein.

FIG. 38 illustrates a potential link between abnormalities in energymetabolism and tumor suppression mediated by the TET family of enzymes.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides novel and improved methods for modulatingpluripotency and differentiation status of cells, novel methods forreprogramming somatic cells, novel research tools for use in themodulation of cellular gene transcription and methylation studies, novelmethods for detecting and isolating 5-methylcytosine and5-hydroxymethylcytosine in nucleic acids, and novel methods for cancertreatment and screening methods therein.

The invention is based upon identification of a novel and surprisingenzymatic activity for the family of TET proteins, namely TET1, TET2,TET3, and CXXC4. This novel enzymatic activity relates to the conversionof the cytosine nucleotide 5-methylcytosine into 5-hydroxymethylcytosinevia a process of hydroxylation by the TET family of proteins.Accordingly, the invention provides novel tools for regulating the DNAmethylation status of mammalian cells. Specifically, these enzymaticactivities can be harnessed in methods for use in human Foxp3+regulatory T cell generation, in the reprogramming of somatic cells, instem cell therapy, in cancer treatment, in the modulation of cellulartranscription, and as research tools for DNA methylation studies.

DNA methylation is catalyzed by at least three DNA methyltransferases(DNMTs) that add methyl groups to the 5′ portion of the cytosine ring toform 5′ methyl-cytosine. During S-phase of the cell cycle, DNMTs, foundat the replication fork, copy the methylation pattern of the parentstrand onto the daughter strand, making methylation patterns heritableover many generations of cell divisions. In mammalian genomes, thismodification occurs almost exclusively on cytosine residues that precedeguanine—i.e., CpG dinucleotides. CpGs occur in the genome at a lowerfrequency than would be statistically predicted because methylatedcytosines can spontaneously deaminate to form thymine. This substitutionis not efficiently recognized by the DNA repair machinery, so C-Tmutations accumulate during evolution. As a result, 99% of the genome isCpG depleted. The other 1% is composed of discrete regions that have ahigh (G+C) and CpG content, and are known as CpG islands.

CpG islands are mostly found at the 5′ regulatory regions of genes, and60% of human gene promoters are embedded in CpG islands. Although mostof the CpG dinucleotides are methylated, the persistence of CpG islandssuggests that they are not methylated in the germ line and thus did notundergo CpG depletion during evolution. Around 90% of CpG islands areestimated to be unmethylated in somatic tissues, and the expression ofgenes that contain CpG islands is not generally regulated by theirmethylation. However, under some circumstances CpG islands do getmethylated, resulting in long-term gene silencing.

Regulated DNA methylation is essential for normal development, as micelacking any one of the enzymes in these pathways die in the embryonicstages or shortly after birth. As a silencing mechanism, DNA methylationplays a role in the normal transcriptional repression of repetitive andcentromeric regions, X chromosome inactivation in females, and genomicimprinting. The silencing mediated by DNA methylation occurs inconjunction with histone modifications and nucleosome remodeling, whichtogether establish a repressive chromatin structure. In addition, it hasbeen shown that many cancerous cells possess aberrant patterns of DNAmethylation.

As 5-hydroxymethylcytosine is not recognized by the5-methylcytosine-binding protein MeCP2 (V. Valinluck, Nucleic AcidsResearch 32: 4100-4108 (2004)), without wishing to be limited by atheory, conversion of 5-methylcytosine into 5-hydroxymethylcytosinecould result in loss of binding of MeCP2 and other5-methylcytosine-binding proteins (MBDs) to DNA, and interfere withchromatin condensation, and therefore result in loss of gene silencingdependent on MBDs.

Additionally, because 5-hydroxymethylcytosine is not recognized by DNAmethyltransferase 1 (Dnmt1), which remethylates hemi-methylated regionsof DNA, particularly during DNA replication (V. Valinluck and L. C.Sowers, Cancer Research 67: 946-950 (2007)), the oxidative conversion of5-methylcytosine to 5-hydroxymethylcytosine would result in net loss of5-methylcytosine in favor of unmethylated cytosine during successivecycles of DNA replication, therefore facilitating the “passive”demethylation of DNA.

Finally, conversion of 5-methylcytosine to 5-hydroxymethylcytosine couldalso lie in the pathway of “active” demethylation if one postulates,without wishing to be bound by a theory, that a specific DNA repairmechanism exists that recognizes 5-hydroxymethylcytosine and replaces itwith cytosine. Without wishing to be limited by a theory, the DNA repairmechanisms that could be utilized for recognition of5-hydroxymethylcytosine include, but are not limited to: direct repair(B. Sedgwick, DNA Repair (Amst).6(4):429-42 (2007)), base excisionrepair (M. L. Hedge, Cell Res.18(1):27-47 (2008)), nucleotide incisionrepair (L. Gros, Nucleic Acids Res.32(1):73-81 (2004)), nucleotideexcision repair (S. C. Shuck, Cell Res.18(1):64-72 (2008)), mismatchrepair (G. M. Li, Cell Res. 18(1):85-98 (2008)), homologousrecombination, and non-homologous end-joining (M. Shrivastav, CellRes.18(1):134-47 (2008)).

We identified a novel enzymatic activity for the TET family of proteins,namely that the TET family of proteins mediate the conversion of5-methylcytosine in cellular DNA to yield 5-hydroxymethylcytosine byhydroxylation.

Methods of Improving the Reprogramming of Somatic Cells for theProduction of Induced Pluripotent Stem Cells and for Use in SomaticNuclear Cell Transfer

The present invention provides, in part, improved methods for thereprogramming of somatic cells into pluripotent stem cells by theadministration of a composition containing at least one catalyticallyactive TET family enzyme, functional TET family derivative, TETcatalytically active fragment, or combination thereof.

The data demonstrate a novel catalytic activity for the TET family ofenzymes, specifically the ability to hydroxylate 5-methylcytosine (5mC)to an intermediate, 5-hydroxymethylcytosine (HMC), and methods whereinto detect this modification.

Accordingly, in one aspect, the invention provides a method forimproving the efficiency or rate with which induced pluripotent stem(iPS) cells can be produced from adult somatic cells, comprisingcontacting a somatic cell being treated to undergo reprogramming with ordelivering to a somatic cell being treated to undergo reprogramming aneffective amount of one or more catalytically active TET family enzyme,one or more functional TET family derivatives, one or more TET catalyticfragments therein, or a combination thereof, in combination with one ormore known pluripotency factors, in vitro or in vivo. In one embodiment,one uses at least one entire catalytically active TET1, TET2, TET3, orCXXC4 protein, or a nucleic acid encoding such protein. In oneembodiment, one uses at least one functional TET1, TET2, TET3, or CXXC4derivative, or at least one nucleic acid encoding such functionalderivatives. In one embodiment, one uses at least one TET1, TET2, TET3,or CXXC4 catalytically active fragment or a nucleic acid encoding atleast one such catalytically active fragment.

In another aspect, the invention provides a method for improving theefficiency or rate with which induced pluripotent stem (iPS) cells canbe produced from adult somatic cells, comprising contacting a somaticcell being treated to undergo reprogramming with, or delivering to, asomatic cell being treated to undergo reprogramming, an effective amountof one or more catalytically active TET family enzymes, one or morefunctional TET family derivatives, or one or more TET catalyticfragments, and an effective amount of one or more inhibitors of TETfamily catalytic activity, in combination with one or more knownpluripotency factors, in vitro or in vivo. In one embodiment, thecatalytically active TET family enzyme, functional TET familyderivatives, or TET catalytic fragments, is a catalytically active TET1and/or TET2 enzyme, and/or functional TET1 and/or TET2 derivative,and/or a TET1 and/or TET2 catalytic fragment, and the inhibitor of TETfamily catalytic activity is a TET3 inhibitor that is specific for onlyTET3. In one embodiment, the inhibitor of TET3 is an siRNA or shRNAsequence specific for inhibiting TET3.

The TET family of proteins as referred to in this aspect, and allaspects and embodiments described herein in this application, comprisesthe nucleotide sequences of TET1, TET2, TET3, and CXXC4 with GenBanknucleotide sequence IDs: GeneID: NM_030625.2 (TET1) (SEQ ID NO:23),GeneID: NM_001127208.1 (TET2) (SEQ ID NO:24), GeneID: NM_144993.1 (TET3)(SEQ ID NO:25), and GeneID: NM_025212.1 (CXXC4) (SEQ ID NO:26) and theprotein sequences of TET1, TET2, and CXCC4 with GenBank peptide sequenceIDs: NP_085128 (TET1) (SEQ ID NO:27), NP_001120680 (TET2) (SEQ IDNO:28), and NP_079488 (CXXC4) (SEQ ID NO:29).

As used herein, a “TET family protein” refers to the sequences of humanTET1, TET2, TET3, and CXXC4, and to proteins having at least 70%, atleast 75%, at least 80%, at least 85%, at least 90%, at least 95%, atleast 98%, at least 99%, or more, homology to human TET1, TET2, or TET3,and displaying a catalytic (hydroxylating) activity of the TET family ofproteins. A “functional TET family derivative”, as used herein, refersto a protein comprising a signature sequence, SEQ ID NO:1, from thecatalytic site of the TET family proteins and having a catalyticactivity of TET proteins.

SEQ ID NO: 1: GVAzAPxHGSzLIECAbxEzHATT

where x=any residue, z=aliphatic residue in the group (L, I, V) andb=basic residue in the group (R, K)

A “TET catalytically active fragment”, as referred to herein, comprisesa protein having a catalytic activity of TET family proteins and asequence meeting one of the following criteria: (1) Identical to thesequence of SEQ ID NO: 2 or one of the empirically verified catalyticfragments; or having homology of at least 70%, at least 75%, at least80%, at least 85%, at least 90%, at least 95%, at least 98%, at least99%, or more, to such a sequence; or (2) incorporating a linearsuccession of the TET signature sequences of SEQ ID NO: 2, SEQ ID NO: 3,and SEQ ID NO: 4 in a defined order, that are predicted to form the coreof the beta-stranded double helix catalytic domain; or having homologyof at least 70%, at least 75%, at least 80%, at least 85%, at least 90%,at least 95%, at least 98%, at least 99%, or more, to such a linearsuccession of TET family signature sequences, and preserving the linearorder thereof.

SEQ ID NO: 3:PFxGxTACxDFxAHxHxDxxN-[X]₅-TxVxTL-[X]₁₃-DEQxHVLPxY-[X]₀₋₇₈₀-GVAxAPxHGSxLIECAxxExHATT-[X]₁₁-RxSLVxYQH,wherein X is any amino acid residue.SEQ ID NO: 4:PFxGxTACxDFxAHxHxDxxN-[X]₅-TxVxTL-[X]₁₂-DEQxHVLPxY-[X]₀₋₇₈₀-GVAxAPxHGSxLIECAxxExHAT-[X]₁₁-RxSLVxYQH,wherein X is any amino acid residue.SEQ ID NO: 5:PFxGxTACxDFxxHxHxDxxN-[X]₂₋₁₁-TxVxTL-[X]₉₋₁₃-DEQxHVLPxY-[X]₀₋₇₈₀-GVAxAPxHGSxLIECAxxExHATT-[X]₅₋₁₃-RxSLVxYQH,wherein X is any amino acid residue.

The human TET3 peptide sequence, as described herein, comprises: SEQ IDNO: 6, as well as that described by GenBank Peptide ID: NP_659430.

In connection with contacting a cell with, or delivering to, acatalytically active TET family enzyme, a functional TET familyderivative, or a TET catalytically active fragment therein, the phrase“increasing the efficiency” of induced pluripotent stem (iPS) cellproduction indicates that the proportion of reprogrammed cells in agiven population is at least 5% higher in populations treated with acatalytically active TET family enzyme, a functional TET familyderivative, or a TET catalytically active fragment therein, than acomparable, control population, wherein no catalytically active TETfamily enzyme, a functional TET family derivative, or a TETcatalytically active fragment thereof, is present. In one embodiment,the proportion of reprogrammed cells in a catalytically active TETfamily enzyme, a functional TET family derivative, or a TETcatalytically active fragment therein treated cell population is atleast 10% higher, at least 15% higher, at least 20% higher, at least 25%higher, at least 30% higher, at least 35% higher, at least 40% higher,at least 45% higher, at least 50% higher, at least 55% higher, at least60% higher, at least 65% higher, at least 70% higher, at least 75%higher, at least 80% higher, at least 85% higher, at least 90% higher,at least 95% higher, at least 98% higher, at least 1-fold higher, atleast 1.5-fold higher, at least 2-fold higher, at least 5-fold higher,at least 10 fold higher, at least 25 fold higher, at least 50 foldhigher, at least 100 fold higher, at least 1000-fold higher, or morethan a control treated cell population of comparable size and cultureconditions. The phrase “control treated cell population of comparablesize and culture conditions” is used herein to describe a population ofcells that has been treated with identical media, viral induction,nucleic acid sequences, temperature, confluency, flask size, pH, etc.,with the exception of the addition of the catalytically active TETfamily enzyme, a functional TET family derivative, or a TETcatalytically active fragment therein.

By the phrase “increasing the rate” of iPS cell production is meant thatthe amount of time for the induction of iPS cells is at least 6 hoursless, at least 12 hours less, at least 18 hours less, at least 1 dayless, at least 2 days less, at least 3 days less, at least 4 days less,at least 5 days less, at least 6 days less, at least 1 week less, atleast 2 weeks less, at least 3 weeks less, or more, in the presence of acatalytically active TET family enzyme, a functional TET familyderivative, or a TET catalytically active fragment therein, than in acontrol treated population of comparable size and culture conditions.

The production of iPS cells, as practiced by those skilled in the art,is generally achieved by the introduction of nucleic acid sequencesencoding stem cell-associated genes into an adult, somatic cell. Ingeneral, these nucleic acids are introduced using retroviral vectors andexpression of the gene products results in cells that aremorphologically and biochemically similar to pluripotent stem cells(e.g., embryonic stem cells). This process of altering a cell phenotypefrom a somatic cell phenotype to a stem cell-like phenotype is referredto herein as “reprogramming”.

Reprogramming can be achieved by introducing a combination of stemcell-associated genes including, or pluripotency inducing factors, suchas Oct3/4 (Pouf51), Sox1, Sox2, Sox3, Sox 15, Sox 18, NANOG, Klf1, Klf2,Klf4, Klf5, c-Myc, 1-Myc, n-Myc and LIN28. In general, successfulreprogramming is accomplished by introducing Oct-3/4, a member of theSox family, a member of the Klf family, and a member of the Myc familyto a somatic cell (K. Takahashi, Cell 126: 663-676 (2006); K. Takahashi,Cell 131: 861-872 (2007); J.Yu, Science 318: 1917-1920 (2007)).

Oct-3/4 (Pou5f1):

Oct-3/4 is one of the family of octamer (“Oct”) transcription factors,and plays a crucial role in maintaining pluripotency. The absence ofOct-3/4 in Oct-3/4+ cells, such as blastomeres and embryonic stem cells,leads to spontaneous trophoblast differentiation, and presence ofOct-3/4 thus gives rise to the pluripotency and differentiationpotential of embryonic stem cells.

Sox Family:

The Sox family of genes is associated with maintaining pluripotencysimilar to Oct-3/4, although it is also associated with multipotent andunipotent stem cells in contrast with Oct-3/4, which is exclusivelyexpressed in pluripotent stem cells. While Sox2 was the initial geneused for induction by Yamanaka et al., Jaenisch et al., and Thomson etal., other genes in the Sox family have been found to work as well inthe induction process. Sox1 yields iPS cells with a similar efficiencyas Sox2, and genes Sox3, Sox15, and Sox18 also generate iPS cells,although with decreased efficiency.

Klf Family:

Klf4 of the Klf family of genes was initially identified by Yamanaka etal. and confirmed by Jaenisch et al. as a factor for the generation ofmouse iPS cells and was demonstrated by Yamanaka et al. as a factor forgeneration of human iPS cells. However, Thomson et al. reported thatKlf4 was unnecessary for generation of human iPS cells and in factfailed to generate human iPS cells. Klf2 and Klf4 have been found to befactors capable of generating iPS cells, and related genes Klf1 and Klf5did as well, although with reduced efficiency.

Myc Family:

The Myc family of genes are proto-oncogenes implicated in cancer.Yamanaka et al. and Jaenisch et al. demonstrated that c-myc is a factorimplicated in the generation of mouse iPS cells and Yamanaka et al.demonstrated it was a factor implicated in the generation of human iPScells. However, Thomson et al., Yamanaka et al., and unpublished work byJohns Hopkins University have reported that c-myc is unnecessary forgeneration of human iPS cells. N-myc and L-myc have been identified toinduce instead of c-myc with similar efficiency.

Nanog:

In embryonic stem cells, Nanog, along with Oct-3/4 and Sox2, isnecessary in promoting pluripotency. Yamanaka et al. has reported thatNanog is unnecessary for induction although Thomson et al. has reportedit is possible to generate iPS cells with Nanog as one of the factors.

LIN28:

LIN28 is an mRNA binding protein expressed in embryonic stem cells andembryonic carcinoma cells associated with differentiation andproliferation. Thomson et al. demonstrated it is a factor in iPSgeneration, although it is unnecessary.

In one embodiment of the methods described herein, reprogramming isachieved by delivery of Oct-4, Sox2, c-Myc, Klf4, or any combinationthereof, to a somatic cell (e.g., a fibroblast). In one embodiment ofthe methods described herein, reprogramming is achieved by delivery ofat least one of Sox-2, Oct-4, Klf-4, c-Myc, Nanog, or Lin-28 to asomatic cell (e.g., a fibroblast). In one embodiment, reprogramming isachieved by delivery of the following four transcription factors, Sox-2,Oct-4, Klf-4, and c-Myc, to a somatic cell. In one embodiment,reprogramming is achieved by delivery of three of the following fourtranscription factors: Sox-2, Oct-4, Klf-4, and c-Myc, to a somaticcell. In one embodiment, reprogramming is achieved by delivery of two ofthe following four transcription factors: Sox-2, Oct-4, Klf-4, andc-Myc, to a somatic cell. In one embodiment, reprogramming is achievedby delivery of one of the following four transcription factors: Sox-2,Oct-4, Klf-4, and c-Myc to a somatic cell. In one embodiment,reprogramming of a somatic cell is achieved in the absence of thefollowing four transcription factors: Sox-2, Oct-4, Klf-4, and c-Myc.

In one embodiment, reprogramming is achieved by delivery of thefollowing four transcription factors, Sox-2, Oct-4, Nanog, and Lin-28,to a somatic cell. In one embodiment, reprogramming is achieved bydelivery of any three of the following four transcription factors:Sox-2, Oct-4, Nanog, or Lin-28 to a somatic cell. In one embodiment,reprogramming is achieved by delivery of two of the following fourtranscription factors: Sox-2, Oct-4, Nanog, or Lin-28 to a somatic cell.In one embodiment, reprogramming is achieved by delivery of one of thefollowing four transcription factors: Sox-2, Oct-4, Nanog, or Lin-28 toa somatic cell. In one embodiment, reprogramming is achieved in theabsence of the following four transcription factors: Sox-2, Oct-4,Nanog, or Lin-28.

In one embodiment, the nucleic acid sequences of one or more of Oct-4,Sox2, c-MYC, Klf4, Nanog, or Lin-28 are delivered using a viral vectoror a plasmid. The viral vector can be, for example, a retroviral vector,a lentiviral vector or an adenoviral vector. In some embodiments, theviral vector is a non-integrating viral vector. In one embodiment,reprogramming is achieved by introducing more than one non-integratingvector (e.g., 2, 3, 4, or more vectors) to a cell, wherein each vectorcomprises a nucleic acid sequence encoding a different reprogrammingfactor (e.g., Oct2, Sox2, c-Myc, Klf4, etc). In an alternate embodiment,more than one reprogramming factor is encoded by a non-integratingvector and expression of the reprogramming factors is controlled using asingle promoter, polycistronic promoters, or multiple promoters.

Non-viral approaches to the introduction of nucleic acids known to thoseskilled in the art can also be used with the methods described herein.Alternatively, activation of the endogenous genes encoding suchtranscription factors can be used. In another embodiment, one or moreproteins that reprogram the cell's differentiation state can beintroduced to the cell. For example, proteins such as c-Myc, Oct4, Sox2and/or Klf4 can be introduced to the cell through the use of HIV-TATfusion. The TAT polypeptide has characteristics that permit it topenetrate the cell, and has been used to introduce exogenous factors tocells (see, e.g., Peitz et al., 2002, Proc. Natl. Acad. Sci. USA.99:4489-94). This approach can be employed to introduce factors forreprogramming the cell's differentiation state. While it is understoodthat reprogramming is usually accomplished by viral delivery ofstem-cell associated genes, it is also contemplated that reprogrammingcan be induced using other delivery methods, such as delivery of thenative, purified proteins (K. Takahashi, Cell 126: 663-676 (2006); K.Takahashi, Cell 131: 861-872 (2007); J.Yu, Science 318: 1917-1920(2007)). In some embodiments, the reprogramming can be induced usingplasmid delivery methods, such as described in Okita K, et al., 2008Nov. 7; 322(5903):949-53. In other embodiments, reprogramming isachieved by the use of recombinant proteins, such as via a repeatedtreatment of the cells with certain proteins channeled into the cells tobe reprogrammed via poly-arginine anchors. Such cells are termed hereinas “protein-induced pluripotent stem cells” or piPS cells, as describedin H. Zhou et al., Cell Stem Cell, 4 (5), 8 May 2009, p. 381-384.

The efficiency of reprogramming (i.e., the number of reprogrammed cells)can be enhanced by the addition of various small molecules as shown byShi, Y., et al (2008) Cell-Stem Cell 2:525-528, Huangfu, D., et al(2008) Nature Biotechnology 26(7):795-797, Marson, A., et al (2008)Cell-Stem Cell 3:132-135, which are incorporated herein by reference intheir entirety. It is contemplated that the methods to increaseefficiency or rate of iPS cell formation through the novel catalyticactivity of one or more members of the TET family described herein canalso be used in combination with a single small molecule (or acombination of small molecules) that enhances the efficiency of inducedpluripotent stem cell production. Some non-limiting examples of agentsthat enhance reprogramming efficiency include soluble Wnt, Wntconditioned media, BIX-01294 (a G9a histone methyltransferase),PD0325901 (a MEK inhibitor), DNA methyltransferase inhibitors, histonedeacetylase (HDAC) inhibitors, valproic acid, 5′-azacytidine,dexamethasone, suberoylanilide, hydroxamic acid (SAHA), trichostatin(TSA), and inhibitors of the TGF-β signaling pathway, among others.

It is thus contemplated that inhibitors can be used alone or incombination with other small molecule(s) to replace one or more of thereprogramming factors used in the methods to improve the efficiency orrate of iPS cell production by modulating TET family enzymatic activityas described. In some embodiments, one or more small molecules or otheragents are used in the place of (i.e. to replace or substitute)exogenously supplied transcription factors, either supplied as a nucleicacid encoding the transcription factor or a protein or polypeptide ofthe exogenously supplied transcription factor, which are typically usedin the production of iPS cells. As discussed herein, “exogenous” or“exogenous supplied” refer to addition of a nucleic acid encoding areprogramming transcription factor (e.g. a nucleic acid encoding Sox2,Klf4, Oct4, c-Myc, Nanog, or Lin-28) or a polypeptide of a reprogrammingfactor (e.g. proteins of Sox2, Klf4, Oct4, c-Myc, Nanog, or Lin-28 orbiologically active fragments thereof) which is normally used inproduction of iPS cells. In some embodiments, reprogramming of a cell isachieved by contacting a cell with one or more agents, such as smallmolecules, where the agent (i.e. small molecules) replaces the need toreprogram the differentiated cell with one or more of exogenous Sox2,Klf4, Oct4, c-Myc, Nanog, or Lin-28.

In one embodiment, replacement of exogenous transcription factor Sox2 isby an agent which is an inhibitor of the TGFβ signalling pathway, suchas a TGFBR1 inhibitor. In other embodiments, a cell to be reprogrammedis contacted with small molecules or other agents which replaceexogenous supplied Oct-4 and Klf-4.

Thus, the methods described herein include methods for producingreprogrammed cells from differentiated cells (i.e. from fibroblastse.g., MEFs) without using exogenous oncogenes, for example c-Myc oroncogenes associated with introduction of nucleic acid sequencesencoding one or more of the transcription factors selected from Sox-2,Oct-4 or Klf-4 into the differentiated cell to be reprogrammed (i.e.viral oncogenes). For example, chemically mediated reprogramming ofdifferentiated cells makes it possible to create reprogrammed cells(i.e. iPS cells) from small numbers of differentiated cells, such asthose obtained from hair follicle cells from patients, blood samples,adipose biopsy, fibroblasts, skin cells, etc). In some embodiments, theaddition of small molecule compounds allows successful and safegeneration of reprogrammed cells (i.e. iPS cells) from humandifferentiated cells, such as skin biopsies (fibroblasts or othernucleated cells) as well as from differentiated cells from all and anyother cell type. In one embodiment, an agent which is an agonist of MEKor Erk cell signalling replaces exogenous transcription factor Klf-4.Examples of such agonists include prostaglandin J2, an inhibitor ofCa2+/calmodulin signaling, EGF receptor tyrosine kinase inhibitor, orHDBA. In one embodiment, exogenous transcription factor Oct-4 isreplaced by an agent that is an inhibitor of Na2+ channels, an agonistof ATP-dependent potassium channels, or an agonist of MAPK signallingpathways.

In general, iPS cells are produced by viral or non-viral delivery ofsaid stem cell-associated genes into adult somatic cells (e.g.,fibroblasts). While fibroblasts are preferred, essentially any primarysomatic cell type can be used. Some non-limiting examples of primarysomatic cells include, but are not limited to, epithelial cells,endothelial cells, neuronal cells, adipose cells, cardiac cells,skeletal muscle cells, immune cells (T, B, NK, NKT, dendritic,monocytes, neutrophils, eosinophils), hepatic cells, splenic cells, lungcells, circulating blood cells, gastrointestinal cells, renal cells,bone marrow cells, and pancreatic cells. The cell can be a primary cellisolated from any somatic tissue including, but not limited to bonemarrow, brain, pancreas, liver, lung, gut, stomach, intestine, fat,muscle, uterus, skin, spleen, thymus, kidney, endocrine organ, bone,etc. Where the cell is maintained under in vitro conditions,conventional tissue culture conditions and methods can be used, and areknown to those of skill in the art. Isolation and culture methods forvarious cells are well within the abilities of one skilled in the art.Further, the parental cell can be from any mammalian species, withnon-limiting examples including a murine, bovine, simian, porcine,equine, ovine, or human cell. The parental cell should not expressembryonic stem cell (ES) markers, e.g., Nanog mRNA or other ES markers,thus the presence of Nanog mRNA or other ES markers indicates that acell has been re-programmed. Where a fibroblast is used, the fibroblastis flattened and irregularly shaped prior to the re-programming, anddoes not express Nanog mRNA. The starting fibroblast will preferably notexpress other embryonic stem cell markers. The expression of ES-cellmarkers can be measured, for example, by RT-PCR. Alternatively,measurement can be by, for example, immunofluorescence or otherimmunological detection approaches that detect the presence ofpolypeptides or other features that are characteristic of the ESphenotype.

To confirm the induction of pluripotent stem cells, isolated clones canbe tested for the expression of a stem cell marker. Such expressionidentifies the cells as induced pluripotent stem cells. Stem cellmarkers can be selected from the non-limiting group including SSEA1,CD9, Nanog, Fbx15, Ecat1, Esg1, Eras, Gdf3, Fgf4, Cripto, Dax1, Zpf296,Slc2a3, Rex1, Utf1, and Nat1. Methods for detecting the expression ofsuch markers can include, for example, RT-PCR and immunological methodsthat detect the presence of the encoded polypeptides. The pluripotentstem cell character of the isolated cells can be confirmed by any of anumber of tests evaluating the expression of ES markers and the abilityto differentiate to cells of each of the three germ layers. As onenon-limiting example, teratoma formation in nude mice can be used toevaluate the pluripotent character of the isolated clones. The cells areintroduced to nude mice and histology is performed on a tumor arisingfrom the cells. The growth of a tumor comprising cells from all threegerm layers (endoderm, mesoderm and ectoderm) further indicates that thecells are pluripotent stem cells. The pluripotent stem cell character ofthe isolated cells can also be confirmed by the creation of chimericmice. For example, the cells can be injected by micropipette into atrophoblast, and the blastocyst transferred to a recipient females,where resulting chimeric living mouse pups (with, for example, 10%-90%chimerism) are indicative of successful generation of iPS cells.Tetraploid complementation can also be used to determine the pluripotentstem cell character of the isolated cells, such that the cells areinjected into tetraploid blastocysts, which themselves can only formextra-embryonic tissues, and the formation of whole, non-chimeric,fertile mice, is indicative of successful generation of iPS cells (X-yZhao et al., 2009, Nature. doi:10.1038/nature08267; L. Kang, et al.2009. Cell Stem Cell. doi:10.1016/j.stem.2009.07.001; and M. J. Bolandet al. Nature. 2009 Aug. 2; 461(7260):91-94).

Another object of the invention is to provide a method for improving theefficiency of cloning mammals by nuclear transfer or nucleartransplantation.

Accordingly, in one aspect the invention provides a method for improvingthe efficiency of cloning mammals by nuclear transfer or nucleartransplantation, the method comprising contacting a nucleus isolatedfrom a cell during a typical nuclear transfer protocol with an effectivehydroxylating-inducing amount of one or more catalytically active TETfamily enzymes, one or more functional TET family derivatives, one ormore TET catalytically active fragments thereof, or any combinationthereof.

In another aspect, the invention provides a method for improving theefficiency of cloning mammals by nuclear transfer or nucleartransplantation, the method comprising contacting a nucleus isolatedfrom a cell during a typical nuclear transfer protocol with an effectiveof one or more catalytically active TET family enzymes, one or morefunctional TET family derivatives, one or more TET catalytic fragments,or any combination thereof, and an effective amount of one or moreinhibitors of TET family catalytic activity, in combination with atleast one known factors that induces pluripotency, in vitro or in vivo.In one embodiment, the catalytically active TET family enzyme,functional TET family derivatives, or TET catalytic fragments, is acatalytically active TET1 and/or TET2 enzyme, and/or functional TET1and/or TET2 derivative, and/or a TET1 and/or TET2 catalytic fragment, orany combination thereof, and the inhibitor of TET family catalyticactivity is a TET3 inhibitor. In one embodiment, the inhibitor of TET3is an siRNA or shRNA sequence specific for TET3.

In one embodiment, the method comprises a typical nuclear transferprotocol. In a non-limiting example, the method comprises the steps of:(a) enucleating an oocyte; (b) isolating and permeabilizing a nucleatedcell, thereby generating a permeabilized cell having pores in its plasmamembrane or a partial plasma membrane or no remaining plasma membrane;(c) dedifferentiating the permeabilized cell containing a nucleus ofstep (b), comprising contacting the nucleus with an effectivehydroxylation inducing amount of one or more catalytically active TETfamily enzymes, one or more functional TET family derivatives, and/orone or more TET catalytically active fragments thereof, underdedifferentiating conditions utilized by ones skilled in the art; (d)transplanting the dedifferentiated nucleus formed in step (c) into anucleated or enucleated egg such that the dedifferentiated nucleus isexposed to an activating egg cytoplasm, thereby forming a reconstitutedoocyte, wherein the recipient egg is from the same species as thesomatic reprogrammed cell nucleus; and (e) transferring thereconstituted oocyte or an embryo formed from the reconstituted oocyteinto a host animal, thus allowing the egg to develop under direction ofgenetic information contained in the transplanted activated nucleus.

In connection with the administration of a catalytically active TETfamily enzyme, a functional TET family derivative, or a TETcatalytically active fragment thereof, “improving the efficiency ofcloning mammals by nuclear transfer or nuclear transplantation”,indicates that the proportion of cloned mammals produced in the presenceof exogenous catalytically active TET family enzymes, functional TETfamily derivatives, or TET catalytically active fragments therein, is atleast 5% higher than a comparable, control treated population. In oneembodiment, the proportion of viable cloned mammals in a catalyticallyactive TET family enzyme, a functional TET family derivative, or a TETcatalytically active fragment, treated population is at least 10%higher, at least 15% higher, at least 20% higher, at least 25% higher,at least 30% higher, at least 35% higher, at least 40% higher, at least45% higher, at least 50% higher, at least 55% higher, at least 60%higher, at least 65% higher, at least 70% higher, at least 75% higher,at least 80% higher, at least 85% higher, at least 90% higher, at least95% higher, at least 98% higher, at least 99% higher, or more than acontrol treated population under comparable conditions, wherein nocatalytically active TET family enzyme, no functional TET familyderivative, or no TET catalytically active fragment is present. The term“control treated population under comparable conditions” is used hereinto describe a population of permeabilized, nucleated cells that havebeen treated with identical media, viral induction, nucleic acidsequence, temperature, confluency, flask size, pH, etc., with theexception of the addition of the catalytically active TET familyenzymes, functional TET family derivatives, or TET catalytically activefragments therein, with all other steps in the protocol remainingidentical.

In one embodiment, somatic cells are cultured for 5 or more passages(about 10 doublings in cell number), more preferably for 7 or morepassages (about 14 doublings in cell number), more preferably for 10(about 20 doublings in cell number) or more passages and yet morepreferably for 15 (about 30 doublings in cell number) passages on asuitable growth medium. Cells are cultured until confluent,disaggregated by chemical and/or mechanical means, and allocated to newgrowth media upon each passage.

It is preferred that the donor cells of the invention be induced toquiescence prior to fusion or microinjection into the recipient cell. Inaccord with the teachings of PCT/GB96/02099 and WO 97/07668, bothassigned to the Roslin Institute (Edinburgh), it is preferred that thedonor nucleus be in either the G0 or G1 phase of the cell cycle at thetime of transfer. Donors must be diploid at the time of transfer inorder to maintain correct ploidy. It is particularly preferred that thedonor cells be in the G0 phase of the cell cycle.

While it is preferred that the recipient of the donor cell nucleus be anoocyte at metaphase I to metaphase II, the present invention may be usedwith other recipients known to those of ordinary skill in the art,including zygotes and two-cell embryos. Activation of oocytes can be byfertilization with sperm or by parthenogenetic activation schemes knownin the art. It is particularly preferred that the recipient beenucleate. A preferred oocyte is an enucleated metaphase II oocyte,non-activated or pre-activated. When a recipient is an enucleatedmetaphase II oocyte, activation may take place at the time of transfer.

It is preferred that the reconstituted oocyte be activated prior toimplantation into the host using techniques known to those of ordinaryskill in the art, such as electrical stimulation. As would be understoodby one of ordinary skill in the art, activation techniques should beoptimized for the particular cell type being used. Non-electrical meansfor activation known in the art include, but are not limited to,ethanol, protein kinase inhibitors (e.g., 6-dimethylpurine (DMAP),ionophores (e.g., ionomycin), temperature change, protein synthesisinhibitors (e.g. cyclohexamide), thapsigargin, phorbol esters (e.g.phorbol 12-myristate 13-acetate (“PMA”)), and mechanical means (See,e.g., Susko-Parrish., U.S. Pat. No. 5,496,720, issued Mar. 5, 1996).

Cultured donor cells may be genetically altered by methods well-known tothose of ordinary skill in the art (see, Molecular Cloning a LaboratoryManual, 2nd Ed., 1989, Sambrook, Fritsch and Maniatis, Cold SpringHarbor Laboratory Press; U.S. Pat. No. 5,612,205, Kay et al., issuedMar. 18, 1997; U.S. Pat. No. 5,633,067, to DeBoer et al., issued May 27,1997). Any known method for inserting, deleting or modifying a desiredgene from a mammalian cell may be used to alter the nuclear donor.Included is the technique of homologous recombination, which allows theinsertion, deletion or modification of a gene or genes at specific siteor sites in the cell genome. Examples for modifying a target DNA genomeby deletion, insertion, and/or mutation are retroviral insertion,artificial chromosome techniques, gene insertion, random insertion withtissue specific promoters, gene targeting, transposable elements and/orany other method for introducing foreign DNA or producing modifiedDNA/modified nuclear DNA. Other modification techniques include deletingDNA sequences from a genome and/or altering nuclear DNA sequences.Nuclear DNA sequences, for example, may be altered by site-directedmutagenesis.

Human Regulatory T Cell Production Using TET Family Proteins

The mechanisms underlying the methylation and demethylation status ofmammalian cells are areas of active research. Most gene regulation istransitory, depending on the current state of the cell and changes inexternal stimuli. Persistent regulation, on the other hand, is a primaryrole of epigenetic modifications, i.e., heritable regulatory patternsthat do not alter the basic genetic coding of the DNA. DNA methylationis the archetypical form of epigenetic regulation, and performs acrucial role in maintaining the long-term identity of various celltypes.

Tissue-specific methylation also serves in regulating adult celltypes/stages, and in some cases a causal relationship betweenmethylation and gene expression has been established. A much studiedexample for such a cell type and cell status specific modification ofcertain gene regions is found during the lineage commitment of naïve Tcells to differentiated helper T cells (Th1 or Th2). Naïve(unstimulated) CD4⁺ T cells become activated upon encountering anantigen and become committed to alternative cell fates through furtherstimulation by interleukins. The two types of helper T cells showreciprocal patterns of gene expression: Th1 cells produceInterferon-gamma (IFN-gamma) and silence IL-4, while Th2 cells produceIL-4 and silence IFN-gamma (K. M. Ansel, Nature Immunology 4:616-623,(2003)). For both alternative cell fates, the expression of these genesis inversely correlated with methylation of proximal CpG sites. In Th2and naive T cells the IFN-gamma promoter is methylated, but not inIFN-gamma expressing Th1 cells (J. T. Attwood, CMLS 59:241-257, (2002)).Conversely, the entire transcribed region of IL-4 becomes demethylatedunder Th2-inducing conditions, strongly correlating with efficienttranscription of IL-4, whereas in Th1 cells, specific untranscribedregions gradually become heavily methylated and IL-4 is not expressed(D. U. Lee, Immunity 16:649-660, (2002)). Furthermore, it has beendemonstrated that in naive T cells, the IL-2 promoter is heavilymethylated and inactive, but after activation of the naive T cell, theIL-2 gene undergoes rapid and specific demethylation at six consecutiveCpGs. This alteration in methylation patterns occurs concomitantly withcell differentiation and increased production of the IL-2 gene product(D. Bruniquel and R. H. Schwartz, Nat. Immunol. 4:235-40, (2003)). Indeveloping immune cells, demethylation during cell fate decisions occurseither passively through exclusion of maintenance methylases from thereplication fork, or actively as in the case of IL-2 where a yet notidentified enzyme is able to actively demethylate the promoter regionupon TCR stimulation.

Regulatory T cells or Treg cells play an important role for themaintenance of immunological tolerance by suppressing the action ofautoreactive effector cells and are critically involved in preventingthe development of autoimmune reactions, thus making them important andattractive targets for therapeutic applications (S. Sakaguchi, NatImmunol 6:345-352, (2005)). While a number of cell surface molecules areused to characterize and define Treg cells, the most common beingCD4+CD25hi, the transcription factor FOXP3 is specifically expressed inthese cells and has been shown to be a critical factor for thedevelopment and function of Treg cells.

It has been demonstrated that a conserved 348 bp fragment upstream ofthe FOXP3 transcription start site contains a minimal promoter necessaryfor induction of FOXP3 expression (P. Y. Mantel, J. Immunol.176(6):3593-602 (2006)). Analysis of the methylation status in a stretchof 8 tightly positioned CpG dinucleotides demonstrated that naturallyoccurring regulatory T cells display a completely demethylated promoterregion. In contrast, induced CD4+CD25hi cells, as well unstimulated andrestimulated CD4+CD2510 cells displayed a partially methylated promoterregion (P. C. Janson, PLoS ONE. 3(2) (2008)). Various data demonstratethat activation of CD4+CD2510 cells results in partial demethylation ofthe human FOXP3 promoter, and that the speed of demethylation correlateswith proliferation, thus indicating a mechanism of passivedemethylation. Importantly, in contrast to the mouse system, theaddition of TGF-β during cell culture of human regulatory T cells doesnot result in a Treg-like demethylation at the human FOXP3 promoter,highlighting the need for alternative mechanisms of modulating themethylation status at the FOXP3 locus for the generation of stable humanregulatory T cell lines.

The importance of demethylation at the FOXP3 locus was demonstrated bythe fact that the addition of DNA methylation-inhibiting 5-azacytidineto in vitro derived human regulatory T cell cultures was sufficient toinduce stable FOXP3 expression, and 5-azacytidine also stabilized TGF-βinduced FOXP3+ Treg cells in restimulation cultures. Similarly, blockingthe maintenance of DNA methylation, by pharmacological inhibition of DNAmethyltransferase-1, induced significant and stable activation-dependentFOXP3 expression in cycling conventional T cells, which was furtheramplified by co-treatment with TGF-β.

Taken together, the results thus far demonstrate that epigeneticmodification, which results in imprinting of FOXP3 expression and stableTreg populations, is not restricted to naturally occurring Treg cellsdifferentiating within the thymus, but can still be initiated inperipheral FOXP3-T cells. Furthermore, the data indicate that stableconversion of CD25-CD4+ T cells into FOXP3+ Treg can only occur underconditions that also induce epigenetic fixation of the Treg phenotype bymodulating the methylation status of the DNA at the FOXP3 locus.However, the biological signals leading to this modulation of themethylation status at the FOXP3 locus remain elusive.

One object of the present invention to provide an improved method ofgenerating stable regulatory T cells.

Accordingly, one aspect of the present invention provides a method forimproving the generation of stable human regulatory FOXP3+ T cells, themethod comprising contacting a human T cell with or delivering to ahuman T cell an effective 5-methylcytosine to 5-hydroxymethylcytosineconverting amount of one or more catalytically active TET familyenzymes, functional TET family derivatives, TET catalytic fragments, orany combination thereof. In one embodiment, one uses the entire proteinof TET1, TET2, TET3, or CXXC4, or a nucleic acid encoding such aprotein, or any combination thereof. In one embodiment, one uses onlythe active hydroxylation-inducing portion of TET1, TET2, TET3, or CXXC4,or a nucleic acid encoding such a fragment, or any combination thereof.

In connection with “contacting with” or “delivering to” a cell a TETfamily enzyme, functional TET family derivative, TET catalytic fragmentthereof, or any combination thereof, the phrase “improving thegeneration of stable human regulatory FOXP3+ cells” indicates that thepercentage of stable human regulatory FOXP3+ cells in a given populationis at least 5% higher in populations treated with a catalytically activeTET family enzyme, a functional TET family derivative, or a TETcatalytic fragment thereof, relative to a comparable, controlpopulation, where no TET family enzyme, functional TET familyderivative, or TET catalytic fragment is present. In one embodiment, thepercentage of stable human regulatory FOXP3+ cells in a catalyticallyactive TET family enzyme, a functional TET family derivative, or a TETcatalytic fragment thereof, treated population is at least 10% higher,at least 15% higher, at least 20% higher, at least 25% higher, at least30% higher, at least 35% higher, at least 40% higher, at least 45%higher, at least 50% higher, at least 55% higher, at least 60% higher,at least 65% higher, at least 70% higher, at least 75% higher, at least80% higher, at least 85% higher, at least 90% higher, at least 95%higher, at least 1-fold higher, at least 1.5-fold higher, at least2-fold higher, at least 5-fold higher, at least 10 fold higher, at least25 fold higher, at least 50 fold higher, at least 100 fold higher, atleast 1000-fold higher, or more than a control treated population ofcomparable size and culture conditions. The phrase “control treatedpopulation of comparable size and culture conditions” is used herein todescribe a population of cells that has been treated with identicalmedia, viral induction, nucleic acid sequences, temperature, confluency,flask size, pH, etc., with the exception of the addition of acatalytically active TET family enzyme, a functional TET familyderivative, or a TET catalytic fragment thereof.

By the phrase “stable human regulatory FOXP3+ T cells” is meant apopulation of CD4 T cells that maintain expression of the transcriptionfactor FOXP3 upon repeated T cell stimulation in the absence ofexogenous regulatory T cell differentiation factors, such as, but notlimited to, TGF-β. Such “stable human regulatory FOXP3+ T cells” possessfunctions known to be characteristic of human regulatory T cells, forexample, but not limited to, the ability to suppress the proliferationof naïve CD4+CD25-cells in a dose-dependent manner, as assayed bytechniques familiar to those in the art, including, but not limited to,tritiated-thymidine incorporation and CFSE assays.

The production of human regulatory FOXP3+ T cells, as practiced by thoseskilled in the art, is generally achieved by purifying CD4+ cells from ahuman source and culturing and expanding the CD4+ cells in the presenceof agents that non-specifically activate the T cell receptor, andcytokines and/or growth factors known to promote survival, growth,function, differentiation, or a combination thereof, of the regulatory Tcell lineage. It is to be understood that the CD4+ T cells may beobtained from in vivo sources, such as, for example, peripheral blood,leukopheresis blood product, apheresis blood product, peripheral lymphnodes, gut associated lymphoid tissue, spleen, thymus, cord blood,mesenteric lymph nodes, liver, sites of immunologic lesions, e.g.synovial fluid, pancreas, cerebrospinal fluid, tumor samples,granulomatous tissue, or any other source where such cells may beobtained. It is to be understood that any technique, which enablesseparation of the CD4 T cells for use in the methods and assaysinvention may be employed, such as flow cytometric sorting, or throughthe use of magnetic bead assays (negative or positive selection), or acombination of such methods, and is to be considered as part of thisinvention.

Cytokines and growth factors, it is to be understood, may includepolypeptides and nonpolypeptide factors. As defined herein, a “cytokine”is any of a number of substances that are secreted by specific cells ofthe immune system which carry signals locally between cells, and thushave an effect on other cells, and include proteins, peptides, orglycoproteins. A cytokine, may include lymphokines, interleukins, andchemokines, and can be classified into: (1) the four a-helix bundlefamily, which is further divided into three sub-families (IL-2subfamily, interferon (IFN) subfamily, and the IL-10 subfamily); (2) theIL-1 family, which primarily includes IL-1 and IL-18; and (3) the IL-17family, which has yet to be completely characterized, though membercytokines have a specific effect in promoting proliferation of T-cellsthat cause cytotoxic effects.

A “growth factor”, as the term is defined herein, refers to a naturallyoccurring substance capable of stimulating cellular growth,proliferation and cellular differentiation. A growth factor may be aprotein or a steroid hormone. A cytokine may be a growth factor. Somenon-limiting examples of growth factor families include: Bonemorphogenetic proteins (BMPs), Epidermal growth factor (EGF),Erythropoietin (EPO), Fibroblast growth factor (FGF), Granulocyte-colonystimulating factor (G-CSF), Granulocyte-macrophage colony stimulatingfactor (GM-CSF), Growth differentiation factor-9 (GDF9), Hepatocytegrowth factor (HGF), Hepatoma derived growth factor (HDGF), Insulin-likegrowth factor (IGF), Myostatin (GDF-8), Nerve growth factor (NGF) andother neurotrophins, Platelet-derived growth factor (PDGF),Thrombopoietin (TPO), Transforming growth factor alpha (TGF-α),Transforming growth factor beta (TGF-β), and Vascular endothelial growthfactor (VEGF).

In general, successful generation of human regulatory FOXP3+ T cells, aspracticed by one of skill in the art, is accomplished by culturingpurified CD4+ T cells in the presence of anti-CD3 and anti-CD28antibodies as T cell receptor stimulating agents, and promoting thedifferentiation of human regulatory FOXP3+ T cells by the addition ofTGF-β to the culture medium. The isolated CD4+ cells cultured under suchconditions can then be assessed for expression of cell-surface markerscharacteristic of the regulatory T cell lineage, such as, but notlimited to, CD25, using techniques standard in the art. It is to beunderstood that the isolated culture-expanded human regulatory FOXP3+ Tcells of this invention may express in addition to CD25 and CD4 anynumber or combination of cell surface markers, as described herein, andas is well known in the art, and are to be considered as part of thisinvention. The isolated CD4+ T cells cultured under such conditions canalso be assessed for expression of the transcription factor defining theregulatory T cell lineage, FOXP3, using techniques known in the art, forexample, but not limited to, intracellular flow cytometric analysisusing a labeled FOXP3 specific monoclonal antibody that can be detectedusing a flow cytometer.

Accordingly, in one embodiment, the method of generating humanregulatory FOXP3+ T cells further comprises contacting the human T cellwith a composition comprising at least one cytokine, growth-factor, oractivating reagents. In one embodiment, the composition comprises TGF-β.

Compositions and Methods for Detecting 5-Methylcytosine and5-Hydroxmethylcytosine

The invention is based, in part, upon identification of a novel andsurprising enzymatic activity for the family of TET proteins, namelyTET1, TET2, TET3, and CXXC4. The novel activity is related to thehydroxylase activity of the TET family enzymes, wherein the hydroxylaseactivity converts the cytosine nucleotide 5-methylcytosine into5-hydroxymethylcytosine. There are currently no techniques or reagentsto detect or map 5-hydroxymethylcytosine residues in genomes, as it isnot recognized either by the 5-methylcytosine binding protein MeCP2 (V.Valinluck, Nucleic Acids Research 32: 4100-4108 (2004)), or existingspecific monoclonal antibodies directed against 5-methylcytosine. Hence,reagents and methods to detect 5-hydroxymethylcytosine are required.

Accordingly, one object of the present invention is directed towardscompositions and methods for the detection of 5-methylcytosine and5-hydroxymethylcytosine nucleotides in a nucleic acid, such as DNA, in abiological sample.

In one embodiment, an assay based on thin-layer chromatography (TLC) isused. Briefly, DNA is extracted from cells and digested with amethylation insensitive enzyme that cuts the DNA regardless of whetherthe internal cytosine in the CG dinucleotide is methylated. Preferably,the restriction enzyme cuts within CCGG sequences, and more preferablythe enzyme is MspI. Alternatively, the enzyme cuts within TCGA, and therestriction enzyme used is Taqα1. The restricted DNA is then treatedwith an agent to remove the newly exposed 5′ phosphate, such as calfintestinal phosphatase. The DNA is then treated to yield fragments thatare almost exclusively labeled on the newly exposed 5′ cytosine,regardless of methylation status, by, for example, end-labeling the DNAwith T4 polynucleotide kinase and [γ32P]ATP. The DNA fragments are thendigested to liberate dNMPs (dinucleotide monophosphates), using agentssuch as, for example, snake venom phosphodiesterase and DNase I. ThedNMPs can then be separated on cellulose TLC plates and excised fornucleotide identification. As a means of confirming the presence of5-hydroxymethylcytosine nucleotide in a sample, a known biologicalsource of the nucleotide may be used, such as T-even phages grown in E.coli lacking GalU (the enzyme that catalyses formation of the glucosedonor UDP-Glucose) and the McrA and McrB1 components of McrBC, whichresults in the exclusive production of 5-hydroxymethylcytosine, and canbe used to compare migration patterns with that of the nucleotidespresent in the sample.

In addition, the methods and compositions described herein generallyinvolve direct detection of 5-methylcytosine and 5-hydroxymethylcytosinenucleotides, with agents that recognize and specifically bind to5-methylcytosine and 5-hydroxymethylcytosine nucleotides in a nucleicacid sequence. These methods and compositions can be used singly or incombination to determine the hydroxymethylation status of cellular DNAor sequence information. In one embodiment, these methods andcompositions can be used to detect 5-hydroxymethylcytosine in cellnuclei for the purposes of immunohistochemistry. In another embodiment,these methods and compositions can be used to immunoprecipitate DNAfragments containing 5-hydroxymethylcytosine from crosslinked DNA bychromatin immunopreciptation (ChIP). The identity of such fragments canthen be determined by deep-sequencing (ChIPseq) or by hybridizing thefragments to genomic tiling arrays.

Accordingly, one embodiment comprises providing an antibody orantigen-binding fragment thereof that specifically binds to5-hydroxymethylcytosine. The antibody or antigen-binding portion thereofcan be contacted with a biological sample under conditions effective toyield a detectable signal if 5-hydroxymethylcytosine is present in thesample, and the antibody or antigen-binding portion thereof binds to the5-hydroxymethylcytosine. A determination can then be made as to whetherthe sample yields a detectable signal, where the presence of thedetectable signal indicates that the sample contains the5-hydroxymethylcytosine. Such a determination can be made using anyequipment that detects the signal, such as a microscope (fluorescent,electron) or flow cytometric device.

In one embodiment, the 5-hydroxymethylcytosine nucleotide is detectedusing a hydroxymethylation-specific antibody,hydroxymethylation-specific antigen-binding fragment thereof, orhydroxymethylation-specific protein.

The methylation of cytosine residues occurs in the DNA of many organismsfrom plants to mammals and is believed to play a critical role in generegulation. There is considerable research into the mechanisms by whichpatterns of cytosine methylation change during the differentiation ofcells and in states of disease. Furthermore, cytosine methylationpatterns are believed to serve as a functional “fingerprint” ofdifferent normal and diseased cell types and of the same cell type atvarious stages of differentiation, and thus mapping the sites ofcytosine methylation on a genome-wide scale is a subject of research.

Novel compositions and methods are provided herein that (1) enablecovalent enzymatic tagging of methylcytosine in polynucleotides, anddetection of the covalent tag; (2) enable covalent enzymatic tagging of5-hydroxymethylcytosine in polynucleotides, and detection of thecovalent tag; and (3) enable detection of 5-hydroxymethylcytosinethrough chemical modification, such as bisulfite treatment. Thecompositions and methods for tagging, modification, detection andisolation further provide, in part, numerous downstream applications foranalysis of methylcytosine and 5-hydroxymethylcytosine inpolynucleotides, including but not limited to, genome-wide analysis ofmethylcytosine and 5-hydroxymethylcytosine patterns in normal anddiseased DNA. The compositions and methods of the inventionsignificantly expand the current state of the art, and can beimmediately applied to basic research, clinical diagnostics, and drugscreening applications.

This invention describes, in part, a method to covalently tag and detectnaturally occurring 5-hydroxymethylcytosine in nucleic acids, such asDNA, for multiple applications. As has been described herein, we haveshown that 5-hydroxymethylcytosine is present in mammalian DNA, which,without wishing to be bound by a theory, may exist as an intermediateduring changes in methylation status of the genome. As described herein,modification of methylcytosine to 5-hydroxymethylcytosine is catalyzedthrough the action of the novel TET family of enzymes. Without wishingto be bound by a theory, we believe that 5-hydroxymethylcytosine in DNAis subsequently converted into unmethylated cytosine.5-hydroxymethylcytosine in DNA may also serve other functions.

As is described herein, in some aspects, methods are provided wherein acatalytically active TET family enzyme, a functional TET familyderivative, or a TET catalytically active fragment thereof is contactedwith a nucleic acid, such as DNA or RNA, to convert methylcytosine innucleic acids to 5-hydroxymethylcytosine. In some embodiments, thenucleic acids are contacted in vitro. In some embodiments, the nucleicacids are contacted in a cell. In some embodiments, the nucleic acidsare contacted in vivo, in a living animal, preferably a mammal, forexample, a human.

Compositions and methods to detect and map methylated andhydroxymethylated cytosine residues in genomes have numerousapplications. Several techniques are currently utilized to mapmethylated cytosine residues. One method involves a chemical reaction ofnucleic acids with sodium hydrogen sulfite (bisulfite), which sulfonatesunmethylated cytosine but does not efficiently sulfonate methylatedcytosine. The sulfonated unmethylated cytosine is prone to spontaneousdeamination, which yields sulfonated uracil. The sulfonated uracil canthen be desulfonated to uracil at low pH. The base-pairing properties ofthe pyrimidines uracil and cytosine are fundamentally different: uracilin DNA is recognized as the equivalent of thymine and therefore ispaired with adenine during hybridization or polymerization of DNA,whereas cytosine is paired with guanosine during hybridization orpolymerization of DNA. Performance of genomic sequencing or PCR onbisulfite treated DNA can therefore be used to distinguish unmethylatedcytosine in the genome, which has been converted to uracil bybisulfite/deamination/desulfonation, versus methylated cytosine, whichhas remained unconverted. This technique is amenable to large-scalescreening approaches when combined with other technologies such asmicroarray hybridization and high-throughput sequencing.

As described, the invention provides, in one aspect, a method ofdetecting 5-hydroxymethylcytosine in complex genomes using bisulfitetreatment of nucleic acids, such as DNA. The method comprises, in part,contacting a nucleic acid of interest, such as isolated genomic DNA oran oligonucleotide, with an effective amount of sodium bisulfite toconvert any 5-hydroxymethylcytosine present in the nucleic acid tocytosine-5-methylenesulfonate. The bisulfite treated nucleic acid isthen digested with an enzyme, such as a methyl sensitive enzyme, and thenucleic acid is end-labeled. In one embodiment, the enzyme is MseI. Inone embodiment, the nucleic acid is end-labeled, for example, using ³²P.The digested and labeled nucleic acid is then contacted with anantiserum, antibody or antigen-fragment thereof specific forcytosine-5-methylenesulfonate. The contacted nucleic acid can then beimmobilized using, for example, beads specific for the species andisotype of antiserum, antibody or antigen-fragment thereof. In oneembodiment, the beads comprise anti-rabbit IgG beads. The amount of5-hydroxymethylcytosine in the immobilized nucleic acid can then bedetermined by obtaining the radiation counts, by, for example, ascintillation counter. In other embodiments of the aspect, the antibodyor antigen-binding fragment is directly labeled. In some embodiments,the label is a fluorescent label or an enzymatic substrate. In someembodiments, the nucleic acid is contacted in vitro. In someembodiments, the nucleic acid is contacted in a cell. In someembodiments, the nucleic acid is contacted in vivo.

In some embodiments, the ability of a test inhibitor to inhibit TETfamily enzymatic activity can be determined using the methods describedherein. For example, genomic DNA is isolated from cells treated with oneor more test inhibitors of TET family enzymatic activity, such assiRNAs, and undergoes bisulfite treatment as described herein. Thepresence of less cytosine-5-methylenesulfonate in a sample treated withthe test inhibitor(s) of TET family enzymatic activity compared with asample to which no test inhibitor(s) was added is indicative of theability of the test inhibitor to inhibit TET family activity.

In other embodiments, the methods described herein to detectcytosine-5-methylenesulfonate in a sample can be used to test whether apatient having a mutation, single nucleotide polymorphism, or othergenetic difference in a TET family member genomic sequence has decreased5-hydroxymethylcytosine.

In other embodiments, the methods of the aspect can be used to isolate anucleic acid having one or more 5-hydroxymethylcytosine residues, foruse, for example, in chromatin immunopreciptation assays. Such isolatednucleic acids can then be sequenced or subjected to PCR amplificationand subsequent sequencing to identify the genomic regions having5-hydroxymethylcytosine residues.

As described herein, the invention provides, in one aspect, novel andsignificant improvements for detecting 5-methylcytosine and5-hydroxymethylcytosine in complex genomes. In some embodiments, acatalytically active TET family enzyme, a functional TET familyderivative, or a TET catalytically active fragment thereof is providedto efficiently convert methylcytosine in nucleic acids to5-hydroxymethylcytosine. In some embodiments, compositions and methodsare provided for using specific and efficient enzymes to convertmethylcytosine residues in nucleic acids toglucosylated-5-hydroxymethylcytosine residues andgentibiose-containing-5-hydroxymethylcytosine residues. In someembodiments, the nucleic acids are contacted in vitro. In someembodiments, the nucleic acids are contacted in a cell. In someembodiments, the nucleic acids are contacted in vivo.

Another method currently used to distinguish methylated versusunmethylated cytosine in genomes is by use of methylation sensitiverestriction enzymes (MSRE). Cytosine methylation in certain sequencecontexts prevents cleavage by MSRE, whereas other enzymes are able tocleave the identical sequence regardless of cytosine methylation status.This differential sensitivity to cytosine methylation can be used toquantitatively determine the degree of methylation in particularstretches of sequence in the genome. Limitations of this method are thatit is less amenable to large-scale approaches, and analysis is limitedto methylation within recognition sites of the restriction enzymes.

As described herein, the invention provides, in one aspect, novel andsignificant improvements for detecting methylcytosine in complexgenomes. The compositions and methods, as described herein, will allowtagging and analysis of all methylated cytosine residues in the genome,as opposed to the limited analysis obtained with MSRE.

A third method used to distinguish methylated versus unmethylatedcytosine in genomes is via affinity purification of methylated cytosineusing antibodies or protein domains (e.g. MBD2) that specifically bindto the methylated cytosine residue. Methylated cytosine containing DNAis bound by these affinity reagents and then enriched by binding of theaffinity reagent to a solid support or other separation strategy.Further analysis such as microarray hybridization and high-throughputsequencing can be performed on either the bound fraction enriched formethylated cytosine-containing DNA, or the unbound fraction enriched forunmethylated cytosine. This technique has the advantage of enrichingregions of interest for further analysis, such as high-throughputsequencing of methylated or unmethylated cytosine in genomes. Onelimitation of this method is that it depends heavily on the bindingaffinity and specificity of the given methylated cytosine bindingprotein, since the binding of these reagents is noncovalent. Anotherlimitation of this method is that it measures density of methylation ina given genomic region, and will not be as sensitive to areas withsparse methylation target sites.

The compositions and methods of the invention provide, in one aspect,improved affinity purification of DNA containing methylated cytosine, byadding covalent tags and/or chemical modifications to methylatedcytosine and 5-hydroxymethylated cytosine residues. This is because, asdescribed herein, detection reagents against glucosylated5-hydroxymethyl cytosine, gentibiose containing 5-hydroxymethylcytosineDNA and chemically modified 5-methylenesulfonate hydroxymethylcytosineare either covalently bound or non-covalently bound with a much higheraffinity and specificity than that currently achievable bymethylcytosine affinity reagents.

In addition, as described herein, novel compositions and methods areprovided for detecting methylated and hydroxymethylated cytosine incomplex genomes. Such compositions and methods utilize the properties ofcertain enzymes to efficiently and specifically add glucose residues tohydroxymethylcytosine in DNA. Enzymes encoded by bacteriophages of the“T even” family have these properties, and those enzymes that addglucose in the alpha configuration are called alpha-glucosyltransferases(AGT), while those enzymes that add glucose in the beta configurationare called beta-glucosyltransferases (BGT). T2, T4, and T6bacteriophages encode AGTs, but only T4 bacteriophages encode BGT. Aminoacids important for the activity of T4 alpha-glucosyltransferases areHis-Asp-His (114-116) ((L. Lariviere, J Mol Biol (2005) 352, 139). Aminoacids important for the activity of T4 beta-glucosyltransferases areAsp-Ile-Arg-Leu (amino acids 100-103) (SEQ ID NO: 17), Met (amino acid231) and Glu (amino acid 311) (L. Lariviere, (2003) J Mol Biol 330,1077). T2 and T6 bacteriophages possess an additional activity thatfurther modifies glucosylated hydroxymethylcytosine by adding anotherglucose molecule in the beta-configuration. This enzyme is calledbeta-glucosyl-alpha-glucosyl-transferase (BGAGT). Addition of the secondglucose results in the formation of a disaccharide containing twoglucose molecules linked in a beta-1-6 configuration, which is known asgentibiose or gentiobiose. The glucose donor used by AGT, BGT, and BGAGTis called uridine diphosphate glucose (UDPG).

In some embodiments of this aspect, enzymes encoded by bacteriophages ofthe “T even” family are provided that add glucose molecules to5-hydroxymethylcytosine residues in nucleic acids. In one embodiment,the 5-hydroxymethylcytosine is naturally occurring. In one embodiment,the 5-hydroxymethylcytosine occurs through contacting DNA with acatalytically active TET family enzyme, a functional TET familyderivative, or a TET catalytically active fragment thereof, therebyconverting methylcytosine to hydroxymethylcytosine. In one embodiment,the enzyme provided is an alpha-glucosyltransferase. In one embodiment,the alpha-glucosyltransferases provided are encoded by a bacteriophageselected from the group consisting of T2, T4, and T6 bacteriophages. Inone embodiment, the enzyme is a beta-glucosyltransferase. In oneembodiment, the beta-glucosyltransferase is encoded by a bacteriophageselected from T4 bacteriophages. In some embodiments, enzymes encoded bybacteriophages of the “T even” family add two glucose molecules linkedin a beta-1-6 configuration to hydroxymethylcytosine to formgentibiose-containing-hydroxymethylcytosine. In one embodiment, theenzyme is a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment,the beta-glucosyl-alpha-glucosyl-transferase is encoded by abacteriophage selected from the group consisting of T2 and T6bacteriophages. In some embodiments, the nucleic acids are in vitro. Insome embodiments, the nucleic acids are in a cell. In some embodiments,the nucleic acids are in vivo.

As defined herein, a “naturally occurring” 5-hydroxymethylcytosineresidue is one which is found in a sample in the absence of any externalmanipulation, or activity. For example, a “naturally occurring5-hydroxymethylcytosine residue” is one found in an isolated nucleicacid that is present due to normal genomic activities, such as, forexample, gene silencing mechanisms.

In some embodiments of this aspect, the addition of glucose orgentibiose molecules to 5-hydroxymethylcytosine residues provides amethod to detect nucleic acids containing hydroxymethylated cytosines.In some embodiments, the method to detect the hydroxymethylated cytosineutilizes radiolabeled glucose and glucose derivative donor substrates.In one such embodiment, the nucleic acid is incubated with analpha-glucosyltransferases, a beta-glucosyltransferase, or abeta-glucosyl-alpha-glucosyl-transferase in the presence of radiolabeleduridine diphosphate glucose (UDPG), and the DNA purified and analyzed byliquid scintillation counting, autoradiography or other means. In onesuch embodiment, the UDPG is radiolabeled with 14C. In one embodiment,the UDPG is radiolabeled with 3H.

In some embodiments of this aspect, proteins that recognize glucoseresidues are used as a method to detect 5-hydroxymethylated cytosine. Insome embodiments, the proteins recognize only the glucose residue. Insome embodiments, the proteins recognize the residue in the context ofhydroxymethyl cytosine. In one embodiment, the protein that recognizesglucose residues is a lectin. In one embodiment, the protein thatrecognizes glucose residues is an antibody or antibody fragment thereof.In one embodiment, the antibody is modified with several tags and usedfor solid-phase purification ofgentibiose-containing-hydroxymethylcytosine in DNA. In one embodiment,the tags are a biotin molecules or beads. In one embodiment, theantibody is modified with gold or fluorescent tags. In one embodiment,the protein that recognizes glucose residues is an enzyme. In oneembodiment, the enzyme is a hexokinase or abeta-glucosyl-alpha-glucosyl-transferase.

In other embodiments of this aspect, the addition of glucose to the5-hydroxymethylcytosine residues provides a method to detect nucleicacids containing hydroxymethylated cytosines. In such embodiments,naturally occurring 5-hydroxymethylcytosine, or 5-hydroxymethylcytosineoccurring through contacting DNA with a catalytically active TET familyenzyme, a functional TET family derivative, or a TET catalyticallyactive fragment thereof, undergoes conversion to glucosylated5-hydroxymethylcytosine using the methods described herein. Theglucosylated 5-hydroxymethylcytosine is then contacted with sodiumperiodate to generate aldehyde residues, and the DNA isolated andprecipitated by any method known to one of skill in the art, such asethanol precipitation. The quantity of aldehyde residues, as determinedby one of skill in the art, can then be used to determine the quantityof 5-hydroxymethylcytosine residues. For example, in one embodiment,aldehye residues can be detected using an aldehyde specific probeconjugated to a tag, such as an enzyme, non-fluorescent moiety, orfluorescent label. In one embodiment, the aldehyde specific probe is analdehydye reactive biotin, and can be detected by streptavidinconjugated to an enzyme. In some embodiments, the enzyme is horseradishperoxidase. In some embodiments of the aspect, the aldehyde specificprobe can be used to perform specific pulldown of the glucosylated DNAresidues, which can be used, for example, to perform chromatinimmunoprecipitation assays to determine in vivo sites of genomic5-hydroxymethylation.

In some embodiments of this aspect, proteins that recognize gentibiosylresidues are used as a method to detect 5-hydroxymethylated cytosine. Insome embodiments, enzymes encoded by bacteriophages of the “T even”family add two glucose molecules linked in a beta-1-6 configuration tohydroxymethylcytosine to formgentibiose-containing-hydroxymethylcytosine. In one embodiment, theenzyme is a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment,the beta-glucosyl-alpha-glucosyl-transferase is encoded by abacteriophage selected from the group consisting of T2 and T6bacteriophages. In some embodiments, the gentibiosyl residue ingentibiose-containing-hydroxymethylcytosine is detected non-covalently.In some embodiments, the non-covalent detection methods utilizesproteins with an affinity for the gentibiosyl residue. In oneembodiment, the protein is an antibody specific togentibiose-containing-hydroxymethylcytosine. In one embodiment, theantibody is modified with several tags and used for solid-phasepurification of gentibiose-containing-hydroxymethylcytosine in DNA. Inone embodiment, the tags are a biotin molecules or beads. In oneembodiment, the antibody is modified with gold or fluorescent tags. Inone embodiment, the protein is a lectin with affinity to gentibiosylresidues. In one embodiment, the lectin is Musa acuminata lectin(BanLec). In one embodiment, the lectin is modified with gold orfluorescent tags. In some embodiments, the proteins with an affinity forthe gentibiosyl residue are used to identifygentibiose-containing-hydroxymethylcytosine in DNA using electronmicroscopy or immunofluorescent detection.

In some embodiments of the aspect, glucose substrates that trap thecovalent enzyme-DNA intermediates are used as a method to detect5-hydroxymethylated cytosine. In some embodiments, enzymes encoded bybacteriophages of the “T even” family add glucose substrates that trapthe covalent enzyme-DNA intermediates to 5-hydroxymethylcytosine in DNA.In some embodiments, the glucose substrate is a UDPG analog. In oneembodiment, the UDPG analog is uridine-2-deoxy-2-fluoro-glucose. In someembodiments, the enzyme encoded by bacteriophages of the “T even” familyis labeled with a tag to facilitate detection and isolation of thecovalently linked enzyme-DNA intermediate. In one embodiment, the tag isa protein. In one embodiment, the tag is not a protein.

In some embodiments of this aspect, the method to detect thehydroxymethylated cytosine uses a chemical that recognizes sugarresidues and catalyzes further reactions that enable additional tags tobe placed on these sugar residues. In one embodiment, the sugar residueis a glucose or a glucose derivative. In one embodiment, the sugarresidue is a gentibiose molecule.

In some embodiments of this aspect, the addition of glucose molecules tohydroxymethylcytosine serves to covalently tag hydroxymethylcytosine fordownstream applications. In one such embodiment, the downstreamapplication involves the detection and purification of DNA containingmethylcytosine and hydroxymethylcytosine. In some embodiments theglucose and glucose derivative donor substrates are radiolabeled fordetection.

In some embodiments of this aspect, the 5-hydroxymethyl residue of5-hydroxymethylcytosine residues in nucleic acids is converted to amethylenesulfonate residue after treatment with sodium hydrogen sulfite.In some embodiments, the addition of sulfonate to5-hydroxymethylcytosine provides a method to detect thehydroxymethylated cytosine residue. In one embodiment, antibodiesspecific for the 5-methylenesulfonate residue in nucleosides are used.In some embodiments, the nucleic acids are in vitro. In someembodiments, the nucleic acids are in a cell. In some embodiments, thenucleic acids are in vivo.

In some embodiments of this aspect, the addition of glucose, glucoseanalogs, or sulfonate molecules to methylcytosine andhydroxymethylcytosine serves to covalently or non-covalently tagmethylcytosine and hydroxymethylcytosine for downstream applications. Inone such embodiment, the downstream application involves the detectionand purification of nucleic acids containing methylcytosine andhydroxymethylcytosine. In some embodiments the glucose and glucosederivative donor substrates are radiolabeled for detection. In someembodiments, the downstream application involves detection ofmethylcytosine and 5-hydroxymethylcytosine in cells or tissues directlyby fluorescence or electron microscopy. In some embodiments, thedownstream application involves detection of methylcytosine and5-hydroxymethylcytosine by assays such as blotting or linked enzymemediated substrate conversion with radioactive, colorimetric,luminescent or fluorescent detection. In some embodiments, thedownstream application involves separation of the tagged nucleic acidsaway from untagged nucleic acids by enzymatic, chemical or mechanicaltreatments, and fractionation of either the tagged or untagged DNA byprecipitation with beads, magnetic means, fluorescent sorting. In someembodiments, this is followed by application to whole genome analysessuch as microarray hybridization and high-throughput sequencing.

Another object of the present invention is to provide methods and assaysto screen for signaling pathways that activate or inhibit TET familyenzymes at the transcriptional, translational, or posttranslationallevels.

Accordingly, one aspect of the invention provides assays for detectingthe activity of the TET family of proteins. In one embodiment, an assayfor detecting increased hydroxymethylcytosine in vitro using anoligonucleotide containing 5-methylcytosine is provided. In oneembodiment, an assay for detecting an increasedcytosine-to-methylcytosine ratio in vitro in an oligonucleotidecontaining 5-methylcytosine is provided. In one embodiment, an assay fordetecting increased hydroxymethylcytosine in cellular DNA is provided.In one embodiment, an assay for detecting an increasedcytosine-to-methylcytosine ratio in cellular DNA is provided. In anotherembodiment, an assay for detecting increased hydroxymethylcytosine intransfected plasmid DNA is provided. In one embodiment, an assay fordetecting an increased cytosine-to-methylcytosine ratio in transfectedplasmid DNA is provided. In another embodiment, an assay for detectingincreased activity of a reporter gene that is initially silenced bypromoter methylation is provided. In one embodiment, an assay for thedetection of other oxidative modifications of pyrimidines in RNA or DNA,in vitro, in cells or in plasmid DNA, is provided.

Another aspect provides a method for detecting factors involved indecreasing the amount of 5-hydroxymethylcytosine residues in a nucleicacid. In some embodiments, the decrease in the amount of5-hydroxymethylcytosine residues is caused by conversion of5-hydroxymethylcytosine to cytosine. In some embodiments, the decreasein 5-hydroxymethylcytosine residues is mediated by a DNA repair protein,such as, for example, a glycosylase. In some embodiments, the DNA repairprotein is one or more proteins selected from MBD4, SMUG1, TDG. NTHL1,NEIL1, NEIL2, or APEX1. In some embodiments, the method comprisesexpressing a test factor in a mammalian cell and determining whether any5-hydroxymethylcytosine residue decreasing activity is present in acellular lysate by monitoring cleavage of a 5-hydroxymethylcytosineresidue containing oligonucleotide. In one embodiment, the methodcomprises expressing a test glycosylase in a mammalian cell, such as,for example, a 293T cell. Oligonucleotides can then be generated andend-labeled, whereby at least one oligonucleotide comprises one or more5-hydroxymethylcytosine residues, and at least one oligonucleotide has aknown substrate for the test glycosylase. The test glycosylaseexpressing cells are then lysed, and the oligonucelotides are added tothe lysate. In one embodiment, the oligonucelotides are exposed toalkaline conditions to generate abasic sites, and then run on adenaturing gel to detect breaks in the oligonucloetides. For example, ifboth the oligonucleotide comprising 5-hydroxymethylcytosine residue andthe oligonucleotide having a known substrate for the test glycosylaseare cut, it indicates that the test glycosylase recognizes5-hydroxymethylcytosine.

A Kit for Enhancing Gene Transcription, Assessment of 5-methylcytosineto 5-Hydroxymethylcytosine Conversion, and Purification of Nucleotides

Other aspects of the present invention provide kits comprising materialsfor performing methods according to the invention as above. A kit can bein any configuration well known to those of ordinary skill in the artand is useful for performing one or more of the methods described hereinfor the conversion of 5-methylcytosine to 5-hydroxymethylcytosine incells, and the detection of 5-methylcytosine and 5-hydroxymethylcytosinein a nucleic acid.

In one embodiment of this aspect, the kit comprises one or morecatalytically active TET family enzymes, functional TET familyderivatives, or TET catalytically active fragments thereof, orengineered nucleic acids encoding such catalytically active TET familyenzymes, functional TET family derivatives, or TET catalytically activefragments thereof, to be contacted with a cell, or plurality of cells.

In one embodiment of this aspect, the kit comprises one or morecatalytically active TET family enzymes, functional TET familyderivatives, or TET catalytically active fragments thereof, and one ormore compositions comprising cytokines, growth factors, and activatingreagents for the purposes of generating stable human regulatory T cells.In one preferred embodiment, the compositions comprising cytokines,growth factor, and activating reagents, comprises TGF-β. In oneembodiment of this aspect, the kit includes packaging materials andinstructions therein to use said kits.

In one embodiment of this aspect, the kit comprises one or morecatalytically active TET family enzymes, functional TET familyderivatives, or TET catalytically active fragments, or engineerednucleic acids encoding such catalytically active TET family enzymes,functional TET family derivatives, or TET catalytically active fragmentsthereof, and the nucleic acid sequences for one or more of Oct-4, Sox2,c-MYC, and Klf4, for the purposes of improving the efficiency or rate ofthe generation of induced pluripotent stem cells. In some embodiments,the nucleic acid sequences for one or more of Oct-4, Sox2, c-MYC, andKlf4 are delivered in a viral vector. In some embodiments, the vector isan adenoviral vector, a lentiviral vector, or a retroviral vector. Inone embodiment of this aspect, the kit includes packaging materials andinstructions therein to use said kits.

In one embodiment of this aspect, the kit comprises one or morecatalytically active TET family enzymes, functional TET familyderivatives, or TET catalytically active fragments thereof, to becontacted with a cell, or plurality of cells for the purposes ofimproving the efficiency of cloning mammals by nuclear transfer. Inpreferred embodiments, the kit includes packaging materials andinstructions therein to use said kits.

In some embodiments, the kit also comprises reagents suitable for thedetection of the activity of one or more catalytically active TET familyenzymes, functional TET family derivatives, or TET catalytically activefragments thereof, namely the production of 5-hydroxymethylcytosine from5-methylcytosine. In one preferred embodiment, the kit comprises anantibody, antigen-binding portion thereof, or protein that specificallybinds to 5-hydroxymethylcytosine. In other embodiments, one or morecatalytically active TET family enzymes, functional TET familyderivatives, or TET catalytically active fragments thereof are providedin a kit to generate nucleic acids containing hydroxymethylcytosine fromnucleic acids containing 5-methylcytosine or other oxidized pyrimidinesfrom appropriate free or nucleic acid precursors. In all suchembodiments of the aspect, the kit includes packaging materials andinstructions therein to use said kits.

In some embodiments of this aspect, the kit also comprises, or consistsessentially of, or consists of, reagents suitable for the detection andpurification of methylcytosine for use in downstream applications. Inone embodiment, the kit comprises, consists essentially of, or consistsof, one or more catalytically active TET family enzymes, functional TETfamily derivatives, or TET catalytically active fragments thereof forthe conversion of methylcytosine to 5-hydroxymethylcytosine; one or moreenzymes encoded by bacteriophages of the “T even” family; one or moreglucose or glucose derivative substrates; one or more proteins to detectglucose or glucose derivative modified nucleotides; and standard DNApurification columns, buffers, and substrate solutions, as known to oneof skill in the art.

In some embodiments of this aspect, the enzymes encoded bybacteriophages of the “T even” family are selected from the groupconsisting of alpha-glucosyltransferases, beta-glucosyltransferases, andbeta-glucosyl-alpha-glucosyl-transferases. In one embodiment, thealpha-glucosyltransferases are encoded by a bacteriophage selected fromthe group consisting of T2, T4, and T6 bacteriophages. In oneembodiment, the beta-glucosyltransferase is encoded by a bacteriophageselected from T4 bacteriophages. In one embodiment, thebeta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophageselected from the group consisting of T2 and T6 bacteriophages.

In some embodiments, the glucose and glucose derivative donor substratesare radiolabeled. In one such embodiment, the radiolabeled glucose andglucose derivative donor substrate is uridine diphosphate glucose(UDPG). In one such embodiment, the UDPG is radiolabeled with 14C. Inone embodiment, the UDPG is radiolabeled with 3H.

In some embodiments, the proteins that recognize glucose or glucosederivative modified nucleotides are selected from a group comprising alectin, an antibody or antigen-binding fragment thereof, or an enzyme.In some embodiments, the proteins recognize only the glucose residue. Insome embodiments, the proteins recognize the residue in the context ofhydroxymethyl cytosine. In one embodiment, the antibody or antibodyfragment thereof is modified with several tags. In one embodiment, thetags are biotin molecules or beads. In one embodiment, the antibody ismodified with gold or fluorescent tags. In one embodiment, the enzyme isa hexokinase or a beta-glucosyl-alpha-glucosyl-transferase. In oneembodiment, the lectin is Musa acuminata lectin (BanLec). In oneembodiment, the lectin is modified with gold or fluorescent tags.

In all such embodiments of the aspect, the kit includes the necessarypackaging materials and informational material therein to use said kits.The informational material can be descriptive, instructional, marketingor other material that relates to the methods described herein and/orthe use of a compound(s) described herein for the methods describedherein. In one embodiment, the informational material can includeinformation about production of the compound, molecular weight of thecompound, concentration, date of expiration, batch or production siteinformation, and so forth. In one embodiment, the informational materialrelates to methods for culturing the compound. In one embodiment, theinformational material can include instructions to culture a compound(s)(e.g., a TET family enzyme) described herein in a suitable manner toperform the methods described herein, e.g., in a suitable dose, dosageform, or mode of administration (e.g., a dose, dosage form, or mode ofadministration described herein) (e.g., to a cell in vitro or a cell invivo). In another embodiment, the informational material can includeinstructions to administer a compound(s) described herein to a suitablesubject, e.g., a human, e.g., a human having or at risk for a disorderdescribed herein or to a cell in vitro.

The informational material of the kits is not limited in its form. Inmany cases, the informational material, e.g., instructions, is providedin printed matter, e.g., a printed text, drawing, and/or photograph,e.g., a label or printed sheet. However, the informational material canalso be provided in other formats, such as Braille, computer readablematerial, video recording, or audio recording. In another embodiment,the informational material of the kit is contact information, e.g., aphysical address, email address, website, or telephone number, where auser of the kit can obtain substantive information about a compounddescribed herein and/or its use in the methods described herein. Ofcourse, the informational material can also be provided in anycombination of formats.

In all embodiments of the aspects described herein, the kit willtypically be provided with its various elements included in one package,e.g., a fiber-based, e.g., a cardboard, or polymeric, e.g., a styrofoambox. The enclosure can be configured so as to maintain a temperaturedifferential between the interior and the exterior, e.g., it can provideinsulating properties to keep the reagents at a preselected temperaturefor a preselected time. The kit can include one or more containers forthe composition containing a compound(s) described herein. In someembodiments, the kit contains separate containers (e.g., two separatecontainers for the two agents), dividers or compartments for thecomposition(s) and informational material. For example, the compositioncan be contained in a bottle, vial, or syringe, and the informationalmaterial can be contained in a plastic sleeve or packet. In otherembodiments, the separate elements of the kit are contained within asingle, undivided container. For example, the composition is containedin a bottle, vial or syringe that has attached thereto the informationalmaterial in the form of a label. In some embodiments, the kit includes aplurality (e.g., a pack) of individual containers, each containing oneor more unit dosage forms (e.g., a dosage form described herein) of acompound described herein. For example, the kit includes a plurality ofsyringes, ampules, foil packets, or blister packs, each containing asingle unit dose of a compound described herein. The containers of thekits can be air tight, waterproof (e.g., impermeable to changes inmoisture or evaporation), and/or light-tight. The kit optionallyincludes a device suitable for administration of the composition, e.g.,a syringe, inhalant, pipette, forceps, measured spoon, dropper (e.g.,eye dropper), swab (e.g., a cotton swab or wooden swab), or any suchdelivery device. In a preferred embodiment, the device is a medicalimplant device, e.g., packaged for surgical insertion.

Methods of Improving Stem Cell Therapies Using TET Family Proteins

Stem cell bioengineering is an emerging technology that holds greatpromise for the therapeutic treatment of a wide range of disorders. Afundamental problem in the field relates to understanding mechanismswhereby stem cell differentiation and lineage commitment can becontrolled in vitro so that the bioengineered stem cells may be used invivo. A method that could easily be adapted to generate a wide range ofstem cell types would allow a multitude of therapeutic applications tobe developed. Human embryonic stem cell research and consequenttherapeutic applications could provide treatments for a variety ofconditions and disorders, including Alzheimer's disease, spinal cordinjuries, amyotrophic lateral sclerosis, Parkinson's disease, type-1diabetes, and cardiovascular diseases. Stem cells that could be readilydifferentiated into desired cell types could also be useful for a numberof tissue engineering applications such as the production of completeorgans, including livers, kidneys, eyes, hearts, or even parts of thebrain. In addition, the ability to control stem cell proliferation anddifferentiation has applicability in developing targeted drugtreatments.

The present invention relates, in part, to novel methods andcompositions that enhance stem cell therapies. One aspect of the presentinvention includes compositions and methods of inducing stem cells todifferentiate into a desired cell type by contacting a stem cell or aplurality of stem cells, with, or delivering to a stem cell or aplurality of stem cells, one or more catalytically active TET familyenzymes, one or more functional TET family derivatives, or one or moreTET catalytically active fragments thereof, or engineered nucleic acidsencoding one or more of such catalytically active TET family enzymes,functional TET family derivatives, or TET catalytically active fragmentsthereof, to increase pluripotency of said cell being contacted ordelivered to.

As defined herein, “stem cells” are primitive undifferentiated cellshaving the capacity to differentiate and mature into other cell types,for example, brain, muscle, liver and blood cells. Stem cells aretypically classified as either embryonic stem cells, or adult tissuederived-stem cells, depending on the source of the tissue from whichthey are derived. “Pluripotent stem cells”, as defined herein, areundifferentiated cells having the potential to differentiate toderivatives of all three embryonic germ layers (endoderm, mesoderm, andectoderm). Adult progenitor cells are adult stem cells which can giverise to a limited number of particular types of cells, such ashematopoetic progenitor cells. Stem cells for use with the presentinvention may be obtained from any source. By way of example,pluripotent stem cells can be isolated from the primordial germinalridge of the developing embryo, from teratocarcinomas, and fromnon-embryonic tissues, including but not limited to the bone marrow,brain, liver, pancreas, peripheral blood, fat tissue, placenta, skeletalmuscle, chorionic villus, and umbilical cord blood. The methods andcompositions of the present invention may be used with and includeembryonic stem cells. Embryonic stem cells are typically derived fromthe inner cell mass of blastocyst-stage embryos (Odorico et al. 2001,Stem Cells 19:193-204; Thomson et al. 1995. Proc Natl Acad Sci USA.92:7844-7848.; Thomson et al. 1998. Science 282:1145-1147). Thedistinguishing characteristics of stem cells are (i) their ability to becultured in their non-differentiated state and (ii) their capacity togive rise to differentiated daughter cells representing all three germlayers of the embryo and the extra-embryonic cells that supportdevelopment. Embryonic stem cells have been isolated from other sites inthe embryo. Embryonic stem cells may be induced to undergo lineagespecific differentiation in response to soluble factors.

According to certain embodiments, the stem cells are of human origin.According to one embodiment, the stem cells are selected from embryonicstem cells and adult stem cells. The adult stem cell can be apluripotent cell or a partially committed progenitor cell.

According to certain embodiments, the composition comprises geneticallymodified stem cells. Typically, the cells are transformed with asuitable vector comprising a nucleic acid sequence for effecting thedesired genetic alteration, as is known to a person skilled in the art.

According to certain embodiments, the stem cells may be partiallycommitted progenitors isolated from several tissue sources. In someembodiments, the partially committed progenitors are hematopoieticcells, neural progenitor cells, oligodendrocyte cells, skin cells,hepatic cells, muscle cells, bone cells, mesenchymal cells, pancreaticcells, chondrocytes or marrow stromal cells.

Such stem cells, upon contact with, or delivery of, one or morecatalytically active TET family enzymes, functional TET familyderivatives, or TET catalytically active fragments thereof, can then beutilized for stem cell therapy treatments, wherein said contacted cellcan undergo further manipulations to differentiate into a desired celltype for use in treatment of a disorder requiring cell or tissuereplacement.

The differentiated stem cells of the present invention may be used asany other differentiated stem cell. By way of a non-limiting example,differentiated stem cells of the present invention can be used fortissue reconstitution or regeneration in a human patient in needthereof. The differentiated stem cells are administered in a manner thatpermits them to graft to the intended tissue site and reconstitute orregenerate the functionally deficient area. One method of administrationis delivery through the peripheral blood vessel of the subject, giventhat stem cells are preferentially attracted to damaged areas. Anotherform of administration is by selective catheterization at or around thesite of damage, which can lead to almost complete delivery of the stemcells into a damaged area.

Methods of Diagnosing and Treating Cancer

The present invention also provides, in part, improved methods for thediagnosis and treatment of cancer by the administration of compositionsmodulating catalytically active TET family enzymes, functional TETfamily derivatives, or TET catalytically active fragments thereof. Alsoencompassed in the methods of the present invention are methods forscreening for the identification of TET family modulators. Such methodscan be used to modify or determine, for example, treatments to beadministered to an individual having or being predisposed to cancer.

Deregulation of gene expression is a hallmark of cancer. Althoughgenetic lesions have been the focus of cancer research for many years,it has become increasingly recognized that aberrant epigeneticmodifications also play major roles in the tumorigenic process. Thesemodifications are imposed on chromatin, do not change the nucleotidesequence of DNA, and are manifested by specific patterns of geneexpression that are heritable through many cell divisions. When ageneral role for DNA methylation in gene silencing was established morethan 25 years ago, it was proposed that aberrant patterns of DNAmethylation might play a role in tumorigenesis. Initial studies foundevidence for a decrease in the total 5-methylcytosine content in tumorcells, and the occurrence of global hypomethylation in cancer was firmlyestablished in subsequent studies. Hypomethylation occurs primarily atDNA repetitive elements and is believed to contribute to the genomicinstability frequently seen in cancer. Hypomethylation can alsocontribute to overexpression of oncogenic proteins, as was shown to beassociated with loss of imprinting of IGF2 (insulin growth factor 2),leading to aberrant activation of the normally silent maternallyinherited allele. This was found to be associated with an increased riskfor colon cancer. The mechanisms underlying global hypomethylationpatterns are the focus of intensive research (E. N. Gal-Yam, Annu RevMed 59: 267-280 (2008)).

Aberrant hypermethylation at normally unmethylated CpG islands occursparallel to global hypomethylation. The CpG island promoter of the Rb(Retinoblastoma) gene, found to be hypermethylated in retinoblastoma,was the first tumor suppressor shown to harbor such a modification. Thisdiscovery was soon followed by studies showing promoter hypermethylationand silencing of other tumor suppressor genes, including, but notlimited to VHL (von Hippel-Lindau) in renal cancer, the cell cycleregulator CDKN2 A/p16 in bladder cancer, and the mismatch repair genehMLH1 in colon cancer. It is now established that aberranthypermethylation at CpG island promoters is a hallmark of cancer.Notably, not only protein-coding genes undergo these modifications; CpGisland promoters of noncoding microRNAs were shown to be hypermethylatedin tumors, possibly contributing to their proposed roles incarcinogenesis (Id.).

The origin for the dysregulated methylation patterns in cancer are anactive area of research. Initially it was suggested that like geneticmutations, de novo hypermethylation events are stochastically generated,and that the final patterns observed are a result of growth advantageand selection. However, several observations made in recent years shouldbe noted: First, hypermethylation events are already apparent atprecancerous stages, such as in benign tumors and in tumor-predisposinginflammatory lesions. Second, there seem to be defined sets ofhypermethylated genes in certain tumors. These differential methylationsignatures, or “methylomes,” may even differentiate between tumors ofthe same type, as was recently shown for the CpG island methylatorphenotype (CIMP) in colon cancer. Third, although many hypermethylatedgenes have tumor-suppressing functions, not all are involved in cellgrowth or tumorigenesis (Id.).

One object of the present invention relates to methods for treating anindividual with, or at risk for, cancer by using an agent that modulatesthe hydroxylase activity of the catalytically active TET family enzymes,functional TET family derivatives, or TET catalytically activefragments.

Accordingly, in one aspect the invention provides a method for treatingan individual with or at risk for cancer using an effective amount ofone or more modulators of the activity of the TET family of proteins. Inone embodiment of the aspect, the method includes selecting a treatmentfor a patient affected by or at risk for developing cancer bydetermining the presence or absence of hypermethylated CpG islandpromoters of tumor suppressor genes, wherein if hypermethylation oftumor suppressor genes is detected, one administers to the individual aneffective amount of a tumor suppressor activity reactivatingcatalytically active TET family enzyme, a functional TET familyderivative, a TET catalytically active fragment therein, an activatingmodulator of TET family activity, or any combination thereof.

In one embodiment, the treatment involves the administration of a TETfamily inhibiting modulator. In particular, the TET family inhibitingmodulator is specific to TET1, TET2, TET3, or CXXC4. In one embodimentof the aspect, the cancer being treated is a leukemia. In oneembodiment, the leukemia is acute myeloid leukemia caused by thet(10:11)(q22:q23) Mixed Lineage Leukemia translocation of TET1. In oneembodiment, the TET family inhibiting modulator is specific to TET2.

The present invention also provides, in another aspect, improved methodsfor the diagnosis of disease conditions by creating methylome orhydroxymethylome signatures for stratifying subjects at risk for adisease condition, and for directing therapy and monitoring the responseto the therapy in subjects. In some embodiments of the aspect, methodsto detect methylcytosine and 5-hydroxymethylcytosine in DNA from asubject diagnosed with or at risk for a disease condition are provided,wherein enzymes encoded by bacteriophages of the “T even” family arecontacted with the DNA and the global level of methylation andhydroxymethylation determined. In one embodiment, the DNA is obtainedfrom a diseased tissue sample of the subject. In one embodiment, theenzyme provided is an alpha-glucosyltransferase. In one embodiment, thealpha-glucosyltransferase provided is encoded by a bacteriophageselected from the group consisting of T2, T4, and T6 bacteriophages. Inone embodiment, the enzyme is a beta-glucosyltransferase. In oneembodiment, the beta-glucosyltransferase is encoded by a bacteriophageselected from T4 bacteriophages. In some embodiments, enzymes encoded bybacteriophages of the “T even” family add two glucose molecules linkedin a beta-1-6 configuration to hydroxymethylcytosine to formgentibiose-containing-hydroxymethylcytosine. In one embodiment, theenzyme is a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment,the beta-glucosyl-alpha-glucosyl-transferase is encoded by abacteriophage selected from the group consisting of T2 and T6bacteriophages. In one embodiment, the disease condition is amyeloproliferative disorder, myelodysplatic disorders, acute myelogenousleukemia, or other malignant and pre-malignant conditions.

In some embodiments of the aspect, methods to detect global levels ofmethylcytosine and 5-hydroxymethylcytosine in DNA from a subject withfamilial predisposition for a disease condition are provided, whereinenzymes encoded by bacteriophages of the “T even” family are contactedwith the DNA. In one embodiment, the enzyme provided is analpha-glucosyltransferase. In one embodiment, thealpha-glucosyltransferase provided is encoded by a bacteriophageselected from the group consisting of T2, T4, and T6 bacteriophages. Inone embodiment, the enzyme is a beta-glucosyltransferase. In oneembodiment, the beta-glucosyltransferase is encoded by a bacteriophageselected from T4 bacteriophages. In some embodiments, enzymes encoded bybacteriophages of the “T even” family add two glucose molecules linkedin a beta-1-6 configuration to hydroxymethylcytosine to formgentibiose-containing-hydroxymethylcytosine. In one embodiment, theenzyme is a beta-glucosyl-alpha-glucosyl-transferase. In one embodiment,the beta-glucosyl-alpha-glucosyl-transferase is encoded by abacteriophage selected from the group consisting of T2 and T6bacteriophages. In one embodiment, the disease condition is amyeloproliferative disorder, myelodysplatic disorders, acute myelogenousleukemia, or other malignant and pre-malignant conditions. In oneembodiment, the DNA is isolated from the CD34+ hematopoietic cells of afamily member of a subject with a disease condition, to determine ifthere is a familial predisposition.

Also encompassed in the methods of the present invention are methods forscreening for and identifying drugs that cause alterations in themethylcytosine and 5-hydroxymethylcytosine residues in genomic DNA usingthe compositions and methods described herein.

As defined herein, the phrase “genetic predisposition” refers to thegenetic makeup of a subject or cell, that makes or predetermines thesubject's or cells' likelihood of being susceptible to a particulardisease, disorder or malignancy, or likelihood of responding to atreatment for a disease disorder or malignancy. Accordingly, as definedherein, an individual having a “familial predisposition” refers to thesubject or individual having one or more family members that have had,have, or have an increased likelihood of developing, a particulardisease, disorder or malignancy, such as, cancer. The familialpredisposition may be due to one or more underlying genetic mutations,or can be caused by shared environmental risk factors in the familymembers, or be a combination thereof.

As defined herein, a “cancer”, “malignancy”, or “malignant condition”refers to the presence of cells possessing characteristics typical ofcancer-causing cells, such as uncontrolled proliferation, immortality,metastatic potential, rapid growth and proliferation rate, and certaincharacteristic morphological features. Often, cancer cells will be inthe form of a tumor, but such cells may exist alone within a patient, ormay be a non-tumorigenic cancer cell, such as a leukemia cell. In somecircumstances, cancer cells will be in the form of a tumor; such cellsmay exist locally, or circulate in the blood stream as independentcells, for example, leukemic cells. Examples of cancers, whereinmethylation status plays a role, include, but are not limited to, breastcancer, a melanoma, adrenal gland cancer, biliary tract cancer, bladdercancer, brain or central nervous system cancer, bronchus cancer,blastoma, carcinoma, a chondrosarcoma, cancer of the oral cavity orpharynx, cervical cancer, colon cancer, colorectal cancer, esophagealcancer, gastrointestinal cancer, glioblastoma, hepatic carcinoma,hepatoma, kidney cancer, leukemia, liver cancer, lung cancer, lymphoma,non-small cell lung cancer, osteosarcoma, ovarian cancer, pancreascancer, peripheral nervous system cancer, prostate cancer, sarcoma,salivary gland cancer, small bowel or appendix cancer, small-cell lungcancer, squamous cell cancer, stomach cancer, testis cancer, thyroidcancer, urinary bladder cancer, uterine or endometrial cancer, andvulval cancer.

“Leukemia” is a cancer of the blood or bone marrow and is characterizedby an abnormal proliferation of white blood cells i.e., leukocytes.There are four major classifications of leukemia comprising of Acutelymphoblastic leukemia (ALL), Chronic lymphocytic leukemia (CLL), Acutemyelogenous leukemia or acute myeloid leukemia (AML), and Chronicmyelogenous leukemia (CML).

“Acute myeloid leukemia” (AML), also known as acute myelogenousleukemia, is a cancer of the myeloid line of white blood cells,characterized by the rapid proliferation of abnormal myeloid cells thataccumulate in the bone marrow and interfere with the production ofnormal blood cells. AML is the most common acute leukemia affectingadults, and its incidence increases with age. The World HealthOrganization (WHO) classification of subtypes of acute myeloid leukemiacomprises of: a) AML with characteristic genetic abnormalities,including, but not limited to AML with translocations between chromosome10 and 11 [t(10, 11)], chromosome 8 and 21 [t(8;21)], chromosome 15 and17 [t(15;17)], and inversions in chromosome 16 [inv(16)]; b) AML withmultilineage dysplasia, which includes patients who have had a priormyelodysplastic syndrome (MDS) or myeloproliferative disease thattransforms into AML; c) AML and myelodysplastic syndrome (MDS),therapy-related, which category includes patients who have had priorchemotherapy and/or radiation and subsequently develop AML or MDS. Theseleukemias may also be characterized by specific chromosomalabnormalities; d) AML not otherwise categorized, which includes subtypesof AML that do not fall into the above categories; and e) Acuteleukemias of ambiguous lineage, which occur when the leukemic cells cannot be classified as either myeloid or lymphoid cells, or where bothtypes of cells are present. Acute myeloid leukemias can further beclassified or diagnosed as: minimally differentiated acute myeloblasticleukemia (M0), acute myeloblastic leukemia, without maturation (M1),acute myeloblastic leukemia, with granulocytic maturation (M2) (causedby t(8;21)(q22;q22), t(6;9)), promyelocytic, or acute promyelocyticleukemia (APL) (M3), (caused by t(15;17)), acute myelomonocytic leukemia(M4), (caused by inv(16)(p13q22), del(16q)), myelomonocytic togetherwith bone marrow eosinophilia (M4eo), (caused by inv(16), t(16;16)),acute monoblastic leukemia (M5a) or acute monocytic leukemia (M5b)(caused by del (11q), t(9;11), t(11;19)), acute erythroid leukemias,including erythroleukemia (M6a) and very rare pure erythroid leukemia(M6b), acute megakaryoblastic leukemia (M7), (caused by t(1;22)), andacute basophilic leukemia (M8).

In connection with the administration of a TET family modulator, a drugwhich is “effective against” a cancer indicates that administration in aclinically appropriate manner results in a beneficial effect for atleast a statistically significant fraction of patients, such as aimprovement of symptoms, a cure, a reduction in disease load, reductionin tumor mass or cell numbers, extension of life, improvement in qualityof life, or other effect generally recognized as positive by medicaldoctors familiar with treating the particular type of disease orcondition.

In connection with determining or modifying a treatment to beadministered to an individual having a cancer, or having familialpredisposition to a cancer, such as a leukemia, the treatment caninclude, for example, imatinib (Gleevac), all-trans-retinoic acid, amonoclonal antibody treatment (gemtuzumab ozogamicin), chemotherapy (forexample, chlorambucil, prednisone, prednisolone, vincristine,cytarabine, clofarabine, farnesyl transferase inhibitors, decitabine,inhibitors of MDR1, rituximab, interferon-α, anthracycline drugs (suchas daunorubicin or idarubicin), L-asparaginase, doxorubicin,cyclophosphamide, doxorubicin, bleomycin, fludarabine, etoposide,pentostatin, or cladribine), bone marrow transplant, stem celltransplant, radiation therapy, anti-metabolite drugs (methotrexate and6-mercaptopurine), or any combination thereof. The modification of thetreatment based upon, for example, determination of thehydroxymethylation status of a cell, or TET family activity, includes,but is not limited, to changing the dosage, frequency, duration, or typeof treatment(s) being administered to a patient in need thereof.

A “TET family modulator” is a molecule that acts to either increase orreduce the production and/or accumulation of TET family gene productactivity in a cell. The molecule can thus either enhance or prevent theaccumulation at any step of the pathway leading from the TET family geneto TET family enzymatic activity, e.g. transcription, mRNA levels,translation, or the enzyme itself. As used interchangeably herein, an“inhibitor”, “inhibiting modulator” or “inhibitory modulator” of the TETfamily is a molecule that acts to reduce the production and/oraccumulation of TET family gene product activity in a cell. Theinhibitor, inhibiting modulator or inhibitory modulator molecule canthus prevent the accumulation at any step of the pathway leading fromthe TET family gene to the TET family enzymatic activity e.g. preventingtranscription, reducing mRNA levels, preventing translation, orinhibiting the enzyme itself. Similarly, as used interchangeably herein,an “activator” or “activating modulator” of the TET family is a moleculethat acts to increase the production and/or accumulation of TET familygene product activity in a cell. The TET family activator or activatingmodulator molecule can thus enhance the accumulation at any step of thepathway leading from the TET family gene to TET family enzymaticactivity e g enhancing transcription, increasing mRNA levels, enhancingtranslation, or activating the enzyme itself.

In one embodiment of the present aspect, the TET family targetingtreatment is a TET family inhibitor. In a preferred embodiment, the TETtargeting treatment is specific for the inhibition of TET1, TET2, TET3,or CXXC4. For example, a small molecule inhibitor, a competitiveinhibitor, an antibody or antigen-binding fragment thereof, or a nucleicacid that inhibits TET1, TET2, TET3, or CXXC4, as encompassed under“Definitions”.

In one embodiment of the present aspect, the TET family targetingtreatment is a TET family activator. Alternatively and preferably, theTET targeting treatment is specific for the activation of TET1, TET2,TET3, or CXXC4. For example, a small molecule activator, an agonist, anantibody or antigen-binding fragment thereof, or a nucleic acid thatactivates TET1, TET2, TET3, or CXXC4, as defined under “Definitions”.

Also encompassed in the methods of the present aspect are methods toscreen for the identification of a TET family modulator for use inanti-cancer therapies. The method comprises a) providing a cellcomprising a TET family enzyme or recombinant TET family enzyme thereof;b) contacting said cell with a test molecule; c) comparing the relativelevels of 5-hydroxymethylated cytosine in cells expressing the TETfamily enzyme or recombinant TET family enzyme thereof in the presenceof the test molecule with the level of 5-hydroxymethylated cytosineexpressed in a control sample in the absence of the test molecule; andd) determining whether or not the test molecule increases or decreasesthe level of 5-hydroxymethylated cytosine, wherein a statisticallysignificant decrease in the level of 5-hydroxymethylated cytosineindicates the molecule is an inhibitor and a statistically significantincrease in the level of 5-hydroxymethylated cytosine indicates themolecule is an activator.

In another embodiment of the aspect, a method for high-throughputscreening for anti-cancer agents is provided. The method comprisesscreening for and identifying TET family modulators. For example,providing a combinatorial library containing a large number of potentialtherapeutic compounds (potential modulator compounds). Such“combinatorial chemical libraries” are then screened in one or moreassays to identify those library members (particular chemical species orsubclasses) that display a desired characteristic activity (e.g.,inhibition of TET family mediated 5-methylcytosine to5-hydroxymethylcytosine conversion or activation of TET family mediated5-methylcytosine to 5-hydroxymethylcytosine conversion). The compoundsthus identified can serve as conventional “lead compounds” or “candidatetherapeutic agents,” and can be derivatized for further testing toidentify additional TET family modulators.

Once identified, such compounds are administered to patients in need ofTET family targeted treatment, for example, patients affected with, orat risk for, developing cancer or cancer metastasis. The route ofadministration may be intravenous (I.V.), intramuscular (I.M.),subcutaneous (S.C.), intradermal (I.D.), intraperitoneal (I.P.),intrathecal (I.T.), intrapleural, intrauterine, rectal, vaginal,topical, intratumor and the like. The compounds of the invention can beadministered parenterally by injection or by gradual infusion over timeand can be delivered by peristaltic means. Administration may be bytransmucosal or transdermal means. For transmucosal or transdermaladministration, penetrants appropriate to the barrier to be permeatedare used in the formulation. Such penetrants are generally known in theart, and include, for example, for transmucosal administration bilesalts and fusidic acid derivatives. In addition, detergents may be usedto facilitate permeation. Transmucosal administration may be throughnasal sprays, for example, or using suppositories. For oraladministration, the compounds of the invention are formulated intoconventional oral administration forms such as capsules, tablets andtonics. For topical administration, the pharmaceutical composition(e.g., inhibitor of TET family activity) is formulated into ointments,salves, gels, or creams, as is generally known in the art. Thetherapeutic compositions of this invention are conventionallyadministered intravenously, as by injection of a unit dose, for example.The term “unit dose” when used in reference to a therapeutic compositionof the present invention refers to physically discrete units suitable asunitary dosage for the subject, each unit containing a predeterminedquantity of active material calculated to produce the desiredtherapeutic effect in association with the required diluent; i.e.,carrier, or vehicle. The compositions are administered in a mannercompatible with the dosage formulation, and in a therapeuticallyeffective amount. The quantity to be administered and timing depends onthe subject to be treated, capacity of the subject's system to utilizethe active ingredient, and degree of therapeutic effect desired.

Any formulation or drug delivery system containing the activeingredients required for TET family modulation, suitable for theintended use, as are generally known to those of skill in the art, canbe used. Suitable pharmaceutically acceptable carriers for oral, rectal,topical or parenteral (including inhaled, subcutaneous, intraperitoneal,intramuscular and intravenous) administration are known to those ofskill in the art. The carrier must be pharmaceutically acceptable in thesense of being compatible with the other ingredients of the formulationand not deleterious to the recipient thereof. As used herein, the terms“pharmaceutically acceptable”, “physiologically tolerable” andgrammatical variations thereof, as they refer to compositions, carriers,diluents and reagents, are used interchangeably and represent that thematerials are capable of administration to or upon a mammal without theproduction of undesirable physiological effects.

Definitions

As used herein, the term “drug” or “compound” refers to a chemicalentity or biological product, or combination of chemical entities orbiological products, administered to a person to treat or prevent orcontrol a disease or condition. The chemical entity or biologicalproduct is preferably, but not necessarily a low molecular weightcompound, but may also be a larger compound, for example, an oligomer ofnucleic acids, amino acids, or carbohydrates including, withoutlimitation, proteins, oligonucleotides, ribozymes, DNAzymes,glycoproteins, siRNAs, lipoproteins, aptamers, and modifications andcombinations thereof.

The terms “effective” and “effectiveness”, as used herein, includes bothpharmacological effectiveness and physiological safety. Pharmacologicaleffectiveness refers to the ability of the treatment to result in adesired biological effect in the patient. Physiological safety refers tothe level of toxicity, or other adverse physiological effects at thecellular, organ and/or organism level (often referred to asside-effects) resulting from administration of the treatment. “Lesseffective” means that the treatment results in a therapeuticallysignificant lower level of pharmacological effectiveness and/or atherapeutically greater level of adverse physiological effects.

As used herein, the phrase “therapeutically effective amount” or“effective amount” are used interchangeably and refer to the amount ofan agent that is effective, at dosages and for periods of time necessaryto achieve the desired therapeutic result, e.g., for an increase inhydroxymethylation for a TET family activator, or a decrease orprevention of hydroxymethylation for a TET family inhibitor. Aneffective amount for treating such a disease related to defects inmethylation is an amount sufficient to result in a reduction oramelioration of the symptoms of the disorder, disease, or medicalcondition. By way of example only, an effective amount of a TET familyinhibitor for treatment of a disease characterized by an increase inhydroxymethylation will cause a decrease in hydroxymethylation. Aneffective amount for treating such an hydroxymethylation-related disease(i.e. one characterized by an increase in hydroxymethylation) is anamount sufficient to result in a reduction or amelioration of thesymptoms of the disorder, disease, or medical condition. The effectiveamount of a given therapeutic agent (i.e. TET family inhibitor or TETfamily activator,) will vary with factors such as the nature of theagent, the route of administration, the size and species of the animal,such as a human, to receive the therapeutic agent, and the purpose ofthe administration.

A therapeutically effective amount of the agents, factors, or inhibitorsdescribed herein, or functional derivatives thereof, can vary accordingto factors such as disease state, age, sex, and weight of the subject,and the ability of the therapeutic compound to elicit a desired responsein the individual or subject. A therapeutically effective amount is alsoone in which any toxic or detrimental effects of the therapeutic agentare outweighed by the therapeutically beneficial effects. The effectiveamount in each individual case can be determined empirically by askilled artisan according to established methods in the art and withoutundue experimentation. Efficacy of treatment can be judged by anordinarily skilled practitioner. Efficacy can be assessed in animalmodels of cancer and tumor, for example treatment of a rodent with anexperimental cancer, and any treatment or administration of an TETfamily inhibitor in a composition or formulation that leads to adecrease of at least one symptom of the cancer, for example a reductionin the size of the tumor.

As used herein, the phrase “pharmaceutically acceptable”, andgrammatical variations thereof, as they refer to compositions, carriers,diluents and reagents, are used interchangeably and represent that thematerials are capable of administration to or upon a mammal without theproduction of undesirable physiological effects such as nausea,dizziness, gastric upset and the like. Each carrier must also be“acceptable” in the sense of being compatible with the other ingredientsof the formulation. A pharmaceutically acceptable carrier typically willnot promote the raising of an immune response to an agent with which itis admixed, unless so desired. The preparation of a pharmacologicalcomposition that contains active ingredients dissolved or dispersedtherein is well understood in the art and need not be limited based onformulation. The pharmaceutical formulation contains a compound of theinvention in combination with one or more pharmaceutically acceptableingredients. The carrier can be in the form of a solid, semi-solid orliquid diluent, cream or a capsule. Typically such compositions areprepared as injectable either as liquid solutions or suspensions,however, solid forms suitable for solution, or suspensions, in liquidprior to use can also be prepared. The preparation can also beemulsified or presented as a liposome composition. The active ingredientcan be mixed with excipients which are pharmaceutically acceptable andcompatible with the active ingredient and in amounts suitable for use inthe therapeutic methods described herein. Suitable excipients are, forexample, water, saline, dextrose, glycerol, ethanol or the like andcombinations thereof. In addition, if desired, the composition cancontain minor amounts of auxiliary substances such as wetting oremulsifying agents, pH buffering agents and the like which enhance theeffectiveness of the active ingredient. The therapeutic composition ofthe present invention can include pharmaceutically acceptable salts ofthe components therein. Pharmaceutically acceptable salts include theacid addition salts (formed with the free amino groups of thepolypeptide) that are formed with inorganic acids such as, for example,hydrochloric or phosphoric acids, or such organic acids as acetic,tartaric, mandelic and the like. Salts formed with the free carboxylgroups can also be derived from inorganic bases such as, for example,sodium, potassium, ammonium, calcium or ferric hydroxides, and suchorganic bases as isopropylamine, trimethylamine, 2-ethylamino ethanol,histidine, procaine and the like. Physiologically tolerable carriers arewell known in the art. Exemplary liquid carriers are sterile aqueoussolutions that contain no materials in addition to the activeingredients and water, or contain a buffer such as sodium phosphate atphysiological pH value, physiological saline or both, such asphosphate-buffered saline. Still further, aqueous carriers can containmore than one buffer salt, as well as salts such as sodium and potassiumchlorides, dextrose, polyethylene glycol and other solutes. Liquidcompositions can also contain liquid phases in addition to and to theexclusion of water. Exemplary of such additional liquid phases areglycerin, vegetable oils such as cottonseed oil, and water-oilemulsions. The amount of an active agent used in the invention that willbe effective in the treatment of a particular disorder or condition willdepend on the nature of the disorder or condition, and can be determinedby standard clinical techniques. The phrase “pharmaceutically acceptablecarrier or diluent” means a pharmaceutically acceptable material,composition or vehicle, such as a liquid or solid filler, diluent,excipient, solvent or encapsulating material, involved in carrying ortransporting the subject agents from one organ, or portion of the body,to another organ, or portion of the body.

The terms “subject” and “individual” are used interchangeably herein,and refer to an animal, for example, a human from whom cells can beobtained (i.e. differentiated cells can be obtained which arereprogrammed) and/or to whom treatment, including prophylactictreatment, with the reprogrammed cells (or their differentiated progeny)as described herein, is provided. For treatment of conditions or diseasestates which are specific for a specific animal such as a human subject,the term subject refers to that specific animal. The term “mammal” isintended to encompass a singular “mammal” and plural “mammals,” andincludes, but is not limited to humans; primates such as apes, monkeys,orangutans, and chimpanzees; canids such as dogs and wolves; felids suchas cats, lions, and tigers; equids such as horses, donkeys, and zebras;food animals such as cows, pigs, and sheep; ungulates such as deer andgiraffes; rodents such as mice, rats, hamsters and guinea pigs; andbears. In some preferred embodiments, a mammal is a human. The“non-human animals” and “non-human mammals” as used interchangeablyherein, includes mammals such as rats, mice, rabbits, sheep, cats, dogs,cows, pigs, and non-human primates. The term “subject” also encompassesany vertebrate including but not limited to mammals, reptiles,amphibians and fish. However, advantageously, the subject is a mammalsuch as a human, or other mammals such as a domesticated mammal, e.g.dog, cat, horse, and the like, or production mammal, e.g. cow, sheep,pig, and the like are also encompassed in the term subject.

As used herein the terms “sample” or “biological sample” means anysample, including but not limited to cells, organisms, lysed cells,cellular extracts, nuclear extracts, or components of cells ororganisms, extracellular fluid, and media in which cells are cultured.

The term “in vitro” as used herein refers to refers to the technique ofperforming a given procedure in a controlled environment outside of aliving organism. The term “in vivo”, as used herein refers toexperimentation using a whole, living organism as opposed to a partialor dead organism, or in an in vitro controlled environment. “Ex vivo” asthe term is used herein, means that which takes place outside anorganism. The term ex vivo is often differentiated from the term invitro in that the tissue or cells need not be in culture; these twoterms are not necessarily synonymous.

The term “pluripotent” as used herein refers to a cell with thecapacity, under different conditions, to differentiate to more than onedifferentiated cell type, and preferably to differentiate to cell typescharacteristic of all three germ cell layers. Pluripotent cells arecharacterized primarily by their ability to differentiate to more thanone cell type, preferably to all three germ layers, using, for example,a nude mouse teratoma formation assay. Pluripotency is also evidenced bythe expression of embryonic stem (ES) cell markers, although thepreferred test for pluripotency is the demonstration of the capacity todifferentiate into cells of each of the three germ layers. In someembodiments, a pluripotent cell is an undifferentiated cell.

The term “stem cell” as used herein, refers to an undifferentiated cellwhich is capable of proliferation and giving rise to more progenitorcells having the ability to generate a large number of mother cells thatcan in turn give rise to differentiated, or differentiable daughtercells. The daughter cells themselves can be induced to proliferate andproduce progeny that subsequently differentiate into one or more maturecell types, while also retaining one or more cells with parentaldevelopmental potential. The term “stem cell” refers to a subset ofprogenitors that have the capacity or potential, under particularcircumstances, to differentiate to a more specialized or differentiatedphenotype, and which retains the capacity, under certain circumstances,to proliferate without substantially differentiating. In one embodiment,the term stem cell refers generally to a naturally occurring mother cellwhose descendants (progeny) specialize, often in different directions,by differentiation, e.g., by acquiring completely individual characters,as occurs in progressive diversification of embryonic cells and tissues.Cellular differentiation is a complex process typically occurringthrough many cell divisions. A differentiated cell may derive from amultipotent cell which itself is derived from a multipotent cell, and soon. While each of these multipotent cells may be considered stem cells,the range of cell types each can give rise to may vary considerably.Some differentiated cells also have the capacity to give rise to cellsof greater developmental potential. Such capacity may be natural or maybe induced artificially upon treatment with various factors. In manybiological instances, stem cells are also “multipotent” because they canproduce progeny of more than one distinct cell type, but this is notrequired for “stem-ness.” Self-renewal is the other classical part ofthe stem cell definition, and it is essential as used in this document.In theory, self-renewal can occur by either of two major mechanisms.Stem cells may divide asymmetrically, with one daughter retaining thestem state and the other daughter expressing some distinct otherspecific function and phenotype. Alternatively, some of the stem cellsin a population can divide symmetrically into two stems, thusmaintaining some stem cells in the population as a whole, while othercells in the population give rise to differentiated progeny only.Formally, it is possible that cells that begin as stem cells mightproceed toward a differentiated phenotype, but then “reverse” andre-express the stem cell phenotype, a term often referred to as“dedifferentiation” or “reprogramming” or “retrodifferentiation” bypersons of ordinary skill in the art. In the context of cell ontogeny,the adjective “differentiated”, or “differentiating” is a relative termmeaning a “differentiated cell” is a cell that has progressed furtherdown the developmental pathway than the cell it is being compared with.Thus, a reprogrammed cell, as this term is defined herein candifferentiate to lineage-restricted precursor cells (such as amesodermal stem cell), which in turn can differentiate into other typesof precursor cells further down the pathway (such as an tissue specificprecursor, for example, a cardiomyocyte precursor), and then to anend-stage differentiated cell, which plays a characteristic role in acertain tissue type, and may or may not retain the capacity toproliferate further.

The term “embryonic stem cell” is used to refer to the pluripotent stemcells of the inner cell mass of the embryonic blastocyst (see U.S. Pat.Nos. 5,843,780, 6,200,806, which are incorporated herein by reference).Such cells can similarly be obtained from the inner cell mass ofblastocysts derived from somatic cell nuclear transfer (see, forexample, U.S. Pat. Nos. 5,945,577, 5,994,619, 6,235,970, which areincorporated herein by reference). The distinguishing characteristics ofan embryonic stem cell define an embryonic stem cell phenotype.Accordingly, a cell has the phenotype of an embryonic stem cell if itpossesses one or more of the unique characteristics of an embryonic stemcell such that that cell can be distinguished from other cells.Exemplary distinguishing embryonic stem cell characteristics include,without limitation, gene expression profile, proliferative capacity,differentiation capacity, karyotype, responsiveness to particularculture conditions, and the like. The term “adult stem cell” or “ASC” isused to refer to any multipotent stem cell derived from non-embryonictissue, including fetal, juvenile, and adult tissue. Stem cells havebeen isolated from a wide variety of adult tissues including blood, bonemarrow, brain, olfactory epithelium, skin, pancreas, skeletal muscle,and cardiac muscle. Each of these stem cells can be characterized basedon gene expression, factor responsiveness, and morphology in culture.Exemplary adult stem cells include neural stem cells, neural crest stemcells, mesenchymal stem cells, hematopoietic stem cells, and pancreaticstem cells. As indicated above, stem cells have been found resident invirtually every tissue.

The term “progenitor cell” is used herein to refer to cells that have acellular phenotype that is more primitive (i.e., is at an earlier stepalong a developmental pathway or progression than is a fullydifferentiated cell) relative to a cell which it can give rise to bydifferentiation. Typically, progenitor cells also have significant orvery high proliferative potential. Progenitor cells can give rise tomultiple distinct differentiated cell types or to a singledifferentiated cell type, depending on the developmental pathway and onthe environment in which the cells develop and differentiate.

The term “differentiated cell” refers to a primary cell that is notpluripotent as that term is defined herein. It should be noted thatplacing many primary cells in culture can lead to some loss of fullydifferentiated characteristics. However, simply culturing such cellsdoes not, on its own, render them pluripotent. The transition topluripotency requires a reprogramming stimulus beyond the stimuli thatlead to partial loss of differentiated character in culture.Reprogrammed pluripotent cells also have the characteristic of thecapacity of extended passaging without loss of growth potential,relative to primary cell parents, which generally have capacity for onlya limited number of divisions in culture. Stated another way, the term“differentiated cell” refers to a cell of a more specialized cell typederived from a cell of a less specialized cell type (e.g., a stem cellsuch as an induced pluripotent stem cell) in a cellular differentiationprocess.

As used herein, the term “somatic cell” refers to a cell forming thebody of an organism, as opposed to germline cells. In mammals, germlinecells (also known as “gametes”) are the spermatozoa and ova which fuseduring fertilization to produce a cell called a zygote, from which theentire mammalian embryo develops. Every other cell type in the mammalianbody—apart from the sperm and ova, the cells from which they are made(gametocytes) and undifferentiated stem cells—is a somatic cell:internal organs, skin, bones, blood, and connective tissue are all madeup of somatic cells. In some embodiments the somatic cell is a“non-embryonic somatic cell”, by which is meant a somatic cell that isnot present in or obtained from an embryo and does not result fromproliferation of such a cell in vitro. In some embodiments the somaticcell is an “adult somatic cell”, by which is meant a cell that ispresent in or obtained from an organism other than an embryo or a fetusor results from proliferation of such a cell in vitro. Unless otherwiseindicated the methods for reprogramming a differentiated cell can beperformed both in vivo and in vitro (where in vivo is practiced when andifferentiated cell is present within a subject, and where in vitro ispracticed using isolated differentiated cell maintained in culture). Insome embodiments, where a differentiated cell or population ofdifferentiated cells are cultured in vitro, the differentiated cell canbe cultured in an organotypic slice culture, such as described in, e.g.,meneghel-Rozzo et al., (2004), Cell Tissue Res, 316(3);295-303. As usedherein, the term “adult cell” refers to a cell found throughout the bodyafter embryonic development.

As used herein, the term “small molecule” refers to a chemical agentincluding, but not limited to, peptides, peptidomimetics, amino acids,amino acid analogs, polynucleotides, polynucleotide analogs, aptamers,nucleotides, nucleotide analogs, organic or inorganic compounds (i.e.,including heteroorganic and organometallic compounds) having a molecularweight less than about 10,000 grams per mole, organic or inorganiccompounds having a molecular weight less than about 5,000 grams permole, organic or inorganic compounds having a molecular weight less thanabout 1,000 grams per mole, organic or inorganic compounds having amolecular weight less than about 500 grams per mole, and salts, esters,and other pharmaceutically acceptable forms of such compounds.

A “nucleic acid”, as described herein, can be RNA or DNA, and can besingle or double stranded, and can be, for example, a nucleic acidencoding a protein of interest, a polynucleotide, an oligonucleotide, anucleic acid analogue, for example peptide-nucleic acid (PNA),pseudo-complementary PNA (pc-PNA), locked nucleic acid (LNA) etc. Suchnucleic acid sequences include, for example, but are not limited to,nucleic acid sequence encoding proteins, for example that act astranscriptional repressors, antisense molecules, ribozymes, smallinhibitory nucleic acid sequences, for example, but not limited to,RNAi, shRNAi, siRNA, micro RNAi (mRNAi), antisense oligonucleotides etc.

As used herein, the term “DNA” is defined as deoxyribonucleic acid. Theterm “polynucleotide” is used herein interchangeably with “nucleic acid”to indicate a polymer of nucleosides. Typically a polynucleotide of thisinvention is composed of nucleosides that are naturally found in DNA orRNA (e.g., adenosine, thymidine, guanosine, cytidine, uridine,deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine)joined by phosphodiester bonds. However the term encompasses moleculescomprising nucleosides or nucleoside analogs containing chemically orbiologically modified bases, modified backbones, etc., whether or notfound in naturally occurring nucleic acids, and such molecules may bepreferred for certain applications. Where this application refers to apolynucleotide it is understood that both DNA, RNA, and in each caseboth single- and double-stranded forms (and complements of eachsingle-stranded molecule) are provided. “Polynucleotide sequence” asused herein can refer to the polynucleotide material itself and/or tothe sequence information (i.e. the succession of letters used asabbreviations for bases) that biochemically characterizes a specificnucleic acid. A polynucleotide sequence presented herein is presented ina 5′ to 3′ direction unless otherwise indicated.

The terms “polypeptide” as used herein refers to a polymer of aminoacids. The terms “protein” and “polypeptide” are used interchangeablyherein. A peptide is a relatively short polypeptide, typically betweenabout 2 and 60 amino acids in length. Polypeptides used herein typicallycontain amino acids such as the 20 L-amino acids that are most commonlyfound in proteins. However, other amino acids and/or amino acid analogsknown in the art can be used. One or more of the amino acids in apolypeptide may be modified, for example, by the addition of a chemicalentity such as a carbohydrate group, a phosphate group, a fatty acidgroup, a linker for conjugation, functionalization, etc. A polypeptidethat has a nonpolypeptide moiety covalently or noncovalently associatedtherewith is still considered a “polypeptide”. Exemplary modificationsinclude glycosylation and palmitoylation. Polypeptides may be purifiedfrom natural sources, produced using recombinant DNA technology,synthesized through chemical means such as conventional solid phasepeptide synthesis, etc. The term “polypeptide sequence” or “amino acidsequence” as used herein can refer to the polypeptide material itselfand/or to the sequence information (i.e., the succession of letters orthree letter codes used as abbreviations for amino acid names) thatbiochemically characterizes a polypeptide. A polypeptide sequencepresented herein is presented in an N-terminal to C-terminal directionunless otherwise indicated.

The term “variant” as used herein refers to a polypeptide or nucleicacid that is “substantially similar” to a wild-type polypeptide orpolynucleic acid. A molecule is said to be “substantially similar” toanother molecule if both molecules have substantially similar structures(i.e., they are at least 50% similar in amino acid sequence asdetermined by BLASTp alignment set at default parameters) and aresubstantially similar in at least one relevant function (e.g., effect oncell migration). A variant differs from the naturally occurringpolypeptide or nucleic acid by one or more amino acid or nucleic aciddeletions, additions, substitutions or side-chain modifications, yetretains one or more specific functions or biological activities of thenaturally occurring molecule.

Amino acid substitutions include alterations in which an amino acid isreplaced with a different naturally-occurring or a non-conventionalamino acid residue. Some substitutions can be classified as“conservative,” in which case an amino acid residue contained in apolypeptide is replaced with another naturally occurring amino acid ofsimilar character either in relation to polarity, side chainfunctionality or size. Substitutions encompassed by variants asdescribed herein can also be “non-conservative,” in which an amino acidresidue which is present in a peptide is substituted with an amino acidhaving different properties (e.g., substituting a charged or hydrophobicamino acid with an uncharged or hydrophilic amino acid), oralternatively, in which a naturally-occurring amino acid is substitutedwith a non-conventional amino acid. Also encompassed within the term“variant,” when used with reference to a polynucleotide or polypeptide,are variations in primary, secondary, or tertiary structure, as comparedto a reference polynucleotide or polypeptide, respectively (e.g., ascompared to a wild-type polynucleotide or polypeptide). Polynucleotidechanges can result in amino acid substitutions, additions, deletions,fusions and truncations in the polypeptide encoded by the referencesequence. Variants can also include insertions, deletions orsubstitutions of amino acids, including insertions and substitutions ofamino acids and other molecules) that do not normally occur in thepeptide sequence that is the basis of the variant, including but notlimited to insertion of ornithine which does not normally occur in humanproteins.

The term “derivative” as used herein refers to peptides which have beenchemically modified, for example by ubiquitination, labeling, pegylation(derivatization with polyethylene glycol) or addition of othermolecules. A molecule is also a “derivative” of another molecule when itcontains additional chemical moieties not normally a part of themolecule. Such moieties can improve the molecule's solubility,absorption, biological half life, etc. The moieties can alternativelydecrease the toxicity of the molecule, or eliminate or attenuate anundesirable side effect of the molecule, etc. Moieties capable ofmediating such effects are disclosed in Remington's PharmaceuticalSciences, 18th edition, A. R. Gennaro, Ed., MackPubl., Easton, Pa.(1990).

Recombinant Proteins

Typically, the proteins or polypeptides of the present invention aresecreted into the growth medium of recombinant E. coli. To isolate thedesired protein, the E. coli host cell carrying a recombinant plasmid ispropagated, homogenized, and the homogenate is centrifuged to removebacterial debris. The supernatant is then subjected to sequentialammonium sulfate precipitation. The fraction containing the desiredprotein of the present invention is subjected to gel filtration in anappropriately sized dextran or polyacrylamide column to separate theproteins. If necessary, the protein fraction may be further purified byHPLC. Alternative methods may be used as suitable. Mutations or variantsof the above polypeptides or proteins are encompassed by the presentinvention. Variants may be modified by, for example, the deletion oraddition of amino acids that have minimal influence on the properties,secondary structure, and hydropathic nature of the desired polypeptide.For example, a polypeptide may be conjugated to a signal (or leader)sequence at the N-terminal end of the protein which co-translationallyor post-translationally directs transfer of the protein. The polypeptidemay also be conjugated to a linker or other sequence for ease ofsynthesis, purification, or identification of the polypeptide.

Fragments of the above proteins are also encompassed by the presentinvention. Suitable fragments can be produced by several means. In thefirst, subclones of the gene encoding the desired protein of the presentinvention are produced by conventional molecular genetic manipulation bysubcloning gene fragments. The subclones then are expressed in vitro orin vivo in bacterial cells to yield a smaller protein or peptide. Inanother approach, based on knowledge of the primary structure of theproteins of the present invention, fragments of the genes of the presentinvention may be synthesized by using the polymerase chain reaction(“PCR”) technique together with specific sets of primers chosen torepresent particular portions of the protein. These then would be clonedinto an appropriate vector for increased expression of an accessorypeptide or protein. Chemical synthesis can also be used to make suitablefragments. Such a synthesis is carried out using known amino acidsequences for the proteins of the present invention. These fragments canthen be separated by conventional procedures (e.g., chromatography,SDS-PAGE) and used in the methods of the present invention.

The nucleic acid molecule encoding a catalytically active TET familyenzyme, a functional TET family derivative, or a TET catalyticallyactive fragment thereof of the present invention can be introduced intoan expression system of choice using conventional recombinanttechnology. Generally, this involves inserting the nucleic acid moleculeinto an expression system to which the molecule is heterologous (i.e.,not normally present). The introduction of a particular foreign ornative gene into a mammalian host is facilitated by first introducingthe gene sequence into a suitable nucleic acid vector. “Vector” is usedherein to mean any genetic element, such as a plasmid, phage,transposon, cosmid, chromosome, virus, virion, etc., which is capable ofreplication when associated with the proper control elements and whichis capable of transferring gene sequences between cells. Thus, the termincludes cloning and expression vectors, as well as viral vectors. Theheterologous nucleic acid molecule is inserted into the expressionsystem or vector in proper sense (5′ to 3′) orientation and correctreading frame. Alternatively, the nucleic acid may be inserted in the“antisense” orientation, i.e, in a 3′ to 5′ prime direction. The vectorcontains the necessary elements for the transcription and translation ofthe inserted protein-coding sequences.

Recombinant genes may also be introduced into viruses, includingvaccinia virus, adenovirus, and retroviruses, including lentivirus.Recombinant viruses can be generated by transfection of plasmids intocells infected with virus. Suitable vectors include, but are not limitedto, the following viral vectors such as lambda vector system gt11, gtWES.tB, Charon 4, and plasmid vectors such as pBR322, pBR325, pACYC177,pACYC184, pUC8, pUC9, pUC18, pUC19, pLG339, pR290, pKC37, pKC101, SV 40,pBluescript II SK+/− or KS+/−(see “Stratagene Cloning Systems” Catalog(1993) from Stratagene, La Jolla, Calif., which is hereby incorporatedby reference in its entirety), pQE, pIH821, pGEX, pET series (see F. W.Studier et. al., “Use of T7 RNA Polymerase to Direct Expression ofCloned Genes,” Gene Expression Technology Vol. 185 (1990), and anyderivatives thereof.

Recombinant molecules can be introduced into cells via transformation,particularly transduction, conjugation, mobilization, orelectroporation. The DNA sequences are cloned into the vector usingstandard cloning procedures in the art, as described by Sambrook et al.,Molecular Cloning: A Laboratory Manual, Cold Springs Laboratory, ColdSprings Harbor, N.Y. (1989), which is hereby incorporated by referencein its entirety. A variety of host-vector systems may be utilized toexpress the protein-encoding sequence of the present invention.Primarily, the vector system must be compatible with the host cell used.Host-vector systems include but are not limited to the following:bacteria transformed with bacteriophage DNA, plasmid DNA, or cosmid DNA;microorganisms such as yeast containing yeast vectors; mammalian cellsystems infected with virus (e.g., vaccinia virus, adenovirus, etc.);insect cell systems infected with virus (e.g., baculovirus); and plantcells infected by bacteria.

The expression elements of these vectors vary in their strength andspecificities. Depending upon the host-vector system utilized, any oneof a number of suitable transcription and translation elements can beused. Different genetic signals and processing events control manylevels of gene expression (e.g., DNA transcription and messenger RNA(“mRNA”) translation).

Transcription of DNA is dependent upon the presence of a promoter whichis a DNA sequence that directs the binding of RNA polymerase and therebypromotes mRNA synthesis. The DNA sequences of eukaryotic promotersdiffer from those of prokaryotic promoters. Furthermore, eukaryoticpromoters and accompanying genetic signals may not be recognized in ormay not function in a prokaryotic system, and, further, prokaryoticpromoters are not recognized and do not function in eukaryotic cells.Similarly, translation of mRNA in prokaryotes depends upon the presenceof the proper prokaryotic signals which differ from those of eukaryotes.Efficient translation of mRNA in prokaryotes requires a ribosome bindingsite called the Shine-Dalgarno (“SD”) sequence on the mRNA. Thissequence is a short nucleotide sequence of mRNA that is located beforethe start codon, usually AUG, which encodes the amino-terminalmethionine of the protein. The SD sequences are complementary to the3′-end of the 16S rRNA (ribosomal RNA) and probably promote binding ofmRNA to ribosomes by duplexing with the rRNA to allow correctpositioning of the ribosome. For a review on maximizing gene expressionsee Roberts and Lauer, Methods in Enzymology, 68:473 (1979), which ishereby incorporated by reference in its entirety. Promoters vary intheir “strength” (i.e., their ability to promote transcription). For thepurposes of expressing a cloned gene, it is desirable to use strongpromoters in order to obtain a high level of transcription and, hence,expression of the gene.

Depending upon the host cell system utilized, any one of a number ofsuitable promoters may be used. For instance, when cloning in E. coli,its bacteriophages, or plasmids, promoters such as the T7 phagepromoter, lac promoter, trp promoter, rec A promoter, ribosomal RNApromoter, the PR and PL promoters of coliphage lambda and others,including but not limited, to lac UV5, omp F, bla, 1pp, and the like,may be used to direct high levels of transcription of adjacent DNAsegments. Additionally, a hybrid trp-lac UV5 (tac) promoter or other E.coli promoters produced by recombinant DNA or other synthetic DNAtechniques may be used to provide for transcription of the insertedgene. Bacterial host cell strains and expression vectors may be chosenwhich inhibit the action of the promoter unless specifically induced. Incertain operons, the addition of specific inducers is necessary forefficient transcription of the inserted DNA. For example, the lac operonis induced by the addition of lactose or IPTG(isopropylthio-beta-D-galactoside). A variety of other operons, such astrp, pro, etc., are under different controls.

Specific initiation signals are also required for efficient genetranscription and translation in prokaryotic cells. These transcriptionand translation initiation signals may vary in “strength” as measured bythe quantity of gene specific messenger RNA and protein synthesized,respectively. The DNA expression vector, which contains a promoter, mayalso contain any combination of various “strong” transcription and/ortranslation initiation signals. For instance, efficient translation inE. coli requires a Shine-Dalgarno (“SD”) sequence about 7-9 bases 5′ tothe initiation codon (ATG) to provide a ribosome binding site. Thus, anySD-ATG combination that can be utilized by host cell ribosomes may beemployed. Such combinations include but are not limited to the SD-ATGcombination from the cro gene or the N gene of coliphage lambda, or fromthe E. coli tryptophan E, D, C, B or A genes. Additionally, any SD-ATGcombination produced by recombinant DNA or other techniques involvingincorporation of synthetic nucleotides may be used. Depending on thevector system and host utilized, any number of suitable transcriptionand/or translation elements, including constitutive, inducible, andrepressible promoters, as well as minimal 5′ promoter elements may beused. The nucleic acid molecule(s) of the present invention, a promotermolecule of choice, a suitable 3′ regulatory region, and if desired, areporter gene, are incorporated into a vector-expression system ofchoice to prepare the nucleic acid construct of present invention usingstandard cloning procedures known in the art, such as described bySambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition,Cold Spring Harbor: Cold Spring Harbor Laboratory Press, New York(2001), which is hereby incorporated by reference in its entirety.

In one aspect of the present invention, a nucleic acid molecule encodinga protein of choice is inserted into a vector in the sense (i.e 5′ to3′) direction, such that the open reading frame is properly oriented forthe expression of the encoded protein under the control of a promoter ofchoice. Single or multiple nucleic acids may be ligated into anappropriate vector in this way, under the control of a suitablepromoter, to prepare a nucleic acid construct of the present invention.Once the isolated nucleic acid molecule encoding, for example, thecatalytically active TET family protein or polypeptide has been clonedinto an expression system, it is ready to be incorporated into a hostcell. Recombinant molecules can be introduced into cells viatransformation, particularly transduction, conjugation, lipofection,protoplast fusion, mobilization, particle bombardment, orelectroporation. The DNA sequences are cloned into the host cell usingstandard cloning procedures known in the art, as described by Sambrooket al., Molecular Cloning: A Laboratory Manual, Second Edition, ColdSprings Laboratory, Cold Springs Harbor, N.Y. (1989), which is herebyincorporated by reference in its entirety. Suitable hosts include, butare not limited to, bacteria, virus, yeast, fungi, mammalian cells,insect cells, plant cells, and the like.

Accordingly, another aspect of the present invention relates to a methodof making a recombinant cell. Essentially, this method is carried out bytransforming a host cell with a nucleic acid construct of the presentinvention under conditions effective to yield transcription of the DNAmolecule in the host cell. In one embodiment, a nucleic acid constructcontaining the nucleic acid molecule(s) of the present invention isstably inserted into the genome of the recombinant host cell as a resultof the transformation. Transient expression in protoplasts allowsquantitative studies of gene expression since the population of cells isvery high (on the order of 10⁶). To deliver DNA inside protoplasts,several methodologies have been proposed, but the most common areelectroporation (Neumann et al., “Gene Transfer into Mouse Lyoma Cellsby Electroporation in High Electric Fields,” EMBO J. 1: 841-45 (1982);Wong et al., “Electric Field Mediated Gene Transfer,” Biochem BiophysRes Commun 30; 107(2):584-7 (1982); Potter et al., “Enhancer-DependentExpression of Human Kappa Immunoglobulin Genes Introduced into Mousepre-B Lymphocytes by Electroporation,” Proc. Natl. Acad. Sci. USA 81:7161-65 (1984), and polyethylene glycol (PEG) mediated DNA uptake,Sambrook et al., Molecular Cloning: A Laboratory Manual, Chap. 16,Second Edition, Cold Springs Laboratory, Cold Springs Harbor, N.Y.(1989). During electroporation, the DNA is introduced into the cell bymeans of a reversible change in the permeability of the cell membranedue to exposure to an electric field. PEG transformation introduces theDNA by changing the elasticity of the membranes. Unlike electroporation,PEG transformation does not require any special equipment andtransformation efficiencies can be equally high. Another appropriatemethod of introducing the gene construct of the present invention into ahost cell is fusion of protoplasts with other entities, eitherminicells, cells, lysosomes, or other fusible lipid-surfaced bodies thatcontain the chimeric gene. Fraley, et al., Proc. Natl. Acad. Sci. USA,79:1859-63 (1982).

Stable transformants are preferable for the methods of the presentinvention, using variations of the methods above as described inSambrook et al., Molecular Cloning: A Laboratory Manual, Chap. 16,Second Edition, Cold Springs Laboratory, Cold Springs Harbor, N.Y.(1989). Typically, an antibiotic or other compound useful for selectivegrowth of the transformed cells only is added as a supplement to themedia. The compound to be used will be dictated by the selectable markerelement present in the plasmid with which the host cell was transformed.Suitable selective marker genes are those which confer resistance to,e.g., gentamycin, G418, hygromycin, streptomycin, spectinomycin,tetracycline, chloramphenicol, and the like. Similarly, “reportergenes,” which encode enzymes providing for production of an identifiablecompound identifiable, or other markers which indicate relevantinformation regarding the outcome of gene delivery, are suitable. Forexample, various luminescent or phosphorescent reporter genes are alsoappropriate, such that the presence of the heterologous gene may beascertained visually. An example of a marker suitable for the presentinvention is the green fluorescent protein (GFP) gene. The isolatednucleic acid molecule encoding a green fluorescent protein can bedeoxyribonucleic acid (DNA) or ribonucleic acid (RNA, includingmessenger RNA or mRNA), genomic or recombinant, biologically isolated orsynthetic. The DNA molecule can be a cDNA molecule, which is a DNA copyof a messenger RNA (mRNA) encoding the GFP. In one embodiment, the GFPcan be from Aequorea victoria (Prasher et al., “Primary Structure of theAequorea Victoria Green-Fluorescent Protein,” Gene 111(2):229-233(1992); U.S. Pat. No. 5,491,084 to Chalfie et al.). A plasmid encodingthe GFP of Aequorea victoria is available from the ATCC as Accession No.75547. Mutated forms of GFP that emit more strongly than the nativeprotein, as well as forms of GFP amenable to stable translation inhigher vertebrates, are commercially available from ClontechLaboratories, Inc. (Palo Alto, Calif.) and can be used for the samepurpose. The plasmid designated pTa1-GFPh (ATCC Accession No. 98299)includes a humanized form of GFP. Indeed, any nucleic acid moleculeencoding a fluorescent form of GFP can be used in accordance with thesubject invention. Standard techniques are then used to place thenucleic acid molecule encoding GFP under the control of the chosen cellspecific promoter. The selection marker employed will depend on thetarget species and/or host or packaging cell lines compatible with achosen vector.

An “inhibitor” of a TET family enzyme, as the term is used herein, canfunction in a competitive or non-competitive manner, and can function,in one embodiment, by interfering with the expression of the TET familypolypeptides. A TET family inhibitor includes any chemical or biologicalentity that, upon treatment of a cell, results in inhibition of thebiological activity caused by activation of the TET family enzymes inresponse to cellular signals. Such an inhibitor can act by binding tothe Cys-rich and double-stranded β-helix domains of the enzymes andblockade of their enzymatic activity. Alternatively, such an inhibitorcan act by causing conformationals shifts within or sterically hinderingthe enzymes, such that enyzmatic activity is abolished or reduced.

Inhibitors of TET Family Proteins and Activity

A “TET family inhibitor”, as used herein, refers to a chemical entity orbiological product, or combination of a chemical entity or a biologicalproduct. The chemical entity or biological product is preferably, butnot necessarily a low molecular weight compound, but can also be alarger compound, for example, an oligomer of nucleic acids, amino acids,or carbohydrates including without limitation proteins,oligonucleotides, ribozymes, DNAzymes, glycoproteins, siRNAs,lipoproteins, aptamers, and modifications and combinations thereof. Theterm “inhibitor” refers to any entity selected from a group comprising;chemicals; small molecules; nucleic acid sequences; nucleic acidanalogues; proteins; peptides; aptamers; antibodies; or fragmentsthereof.

A nucleic acid sequence can be RNA or DNA, and can be single or doublestranded, and can be selected from a group comprising; nucleic acidencoding a protein of interest, oligonucleotides, nucleic acidanalogues, for example peptide-nucleic acid (PNA), pseudo-complementaryPNA (pc-PNA), locked nucleic acid (LNA), etc. Such nucleic acidsequences include, for example, but not limited to, nucleic acidsequence encoding proteins, for example that act as transcriptionalrepressors, antisense molecules, ribozymes, small inhibitory nucleicacid sequences, for example but not limited to RNAi, shRNAi, siRNA,micro RNAi (mRNAi), antisense oligonucleotides etc.

A protein and/or peptide agent can be any protein of interest, forexample, but not limited to; mutated proteins; therapeutic proteins;truncated proteins, wherein the protein is normally absent or expressedat lower levels in the cell. Proteins can also be selected from a groupcomprising; mutated proteins, genetically engineered proteins, peptides,synthetic peptides, recombinant proteins, chimeric proteins, antibodies,midibodies, tribodies, humanized proteins, humanized antibodies,chimeric antibodies, modified proteins and fragments thereof. In someembodiments, the agent is any chemical, entity or moiety, includingwithout limitation synthetic and naturally-occurring non-proteinaceousentities. In certain embodiments the agent is a small molecule having achemical moiety. For example, chemical moieties included unsubstitutedor substituted alkyl, aromatic, or heterocyclyl moieties includingmacrolides, leptomycins and related natural products or analoguesthereof. Inhibitors can be known to have a desired activity and/orproperty, or can be selected from a library of diverse compounds.

Antibody Inhibitors of TET Family Enzymes:

Antibodies that specifically bind TET family enzymes can be used forinhibition in vivo, in vitro, or ex vivo. The TET family inhibitoryactivity of a given antibody, or, for that matter, any TET familyinhibitor, can be assessed using methods known in the art or describedherein. An antibody that inhibits TET family enzymes causes a decreasein the conversion of 5-methylcytosine to 5-hydroxymethylcytosine in theDNA of a cell. Specific binding is typically defined as binding thatdoes not recognize other antigens, such as a protein, nucleotide,chemical residue, etc., at a detectable level in an assay used.

Antibody inhibitors of TET family enzymes can include polyclonal andmonoclonal antibodies and antigen-binding derivatives or fragmentsthereof. Well known antigen binding fragments include, for example,single domain antibodies (dAbs; which consist essentially of single VLor VH antibody domains), Fv fragment, including single chain Fv fragment(scFv), Fab fragment, and F(ab′)2 fragment. Methods for the constructionof such antibody molecules are well known in the art. As used herein,the term “antibody” refers to an intact immunoglobulin or to amonoclonal or polyclonal antigen-binding fragment with the Fc(crystallizable fragment) region or FcRn binding fragment of the Fcregion. Antigen-binding fragments may be produced by recombinant DNAtechniques or by enzymatic or chemical cleavage of intact antibodies.“Antigen-binding fragments” include, inter alia, Fab, Fab′, F(ab′)2, Fv,dAb, and complementarity determining region (CDR) fragments,single-chain antibodies (scFv), single domain antibodies, chimericantibodies, diabodies and polypeptides that contain at least a portionof an immunoglobulin that is sufficient to confer specific antigenbinding to the polypeptide. The terms Fab, Fc, pFc′, F(ab′) 2 and Fv areemployed with standard immunological meanings [Klein, Immunology (JohnWiley, New York, N.Y., 1982); Clark, W. R. (1986) The ExperimentalFoundations of Modern Immunology (Wiley & Sons, Inc., New York); Roitt,I. (1991) Essential Immunology, 7th Ed., (Blackwell ScientificPublications, Oxford)].

Nucleic Acid Inhibitors of TET Family Enzymes:

A powerful approach for inhibiting the expression of selected targetpolypeptides is through the use of RNA interference agents. RNAinterference (RNAi) uses small interfering RNA (siRNA) duplexes thattarget the messenger RNA encoding the target polypeptide for selectivedegradation. siRNA-dependent post-transcriptional silencing of geneexpression involves cleaving the target messenger RNA molecule at a siteguided by the siRNA. “RNA interference (RNAi)” is an evolutionallyconserved process whereby the expression or introduction of RNA of asequence that is identical or highly similar to a target gene results inthe sequence specific degradation or specific post-transcriptional genesilencing (PTGS) of messenger RNA (mRNA) transcribed from that targetedgene (see Coburn, G. and Cullen, B. (2002) J. of Virology 76(18):9225),thereby inhibiting expression of the target gene. In one embodiment, theRNA is a double stranded RNA (dsRNA). In another embodiment, the RNA isa single stranded DNA. This process has been described in plants,invertebrates, and mammalian cells. In nature, RNAi is initiated by thedsRNA-specific endonuclease Dicer, which promotes processive cleavage oflong dsRNA into double-stranded fragments termed siRNAs. siRNAs areincorporated into a protein complex (termed “RNA induced silencingcomplex,” or “RISC”) that recognizes and cleaves target mRNAs. RNAi canalso be initiated by introducing nucleic acid molecules, e.g., syntheticsiRNAs or RNA interfering agents, to inhibit or silence the expressionof target genes. As used herein, “inhibition of target gene expression”includes any decrease in expression or protein activity or level of thetarget gene or protein encoded by the target gene as compared to asituation wherein no RNA interference has been induced. The decreasewill be of at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or99% or more as compared to the expression of a target gene or theactivity or level of the protein encoded by a target gene which has notbeen targeted by an RNA interfering agent.

The terms “RNA interference agent” and “RNA interference” as they areused herein are intended to encompass those forms of gene silencingmediated by double-stranded RNA, regardless of whether the RNAinterfering agent comprises an siRNA, miRNA, shRNA or otherdouble-stranded RNA molecule. “Short interfering RNA” (siRNA), alsoreferred to herein as “small interfering RNA” is defined as an RNA agentwhich functions to inhibit expression of a target gene, e.g., by RNAi.An siRNA may be chemically synthesized, may be produced by in vitrotranscription, or may be produced within a host cell. In one embodiment,siRNA is a double stranded RNA (dsRNA) molecule of about 15 to about 40nucleotides in length, preferably about 15 to about 28 nucleotides, morepreferably about 19 to about 25 nucleotides in length, and morepreferably about 19, 20, 21, 22, or 23 nucleotides in length, and maycontain a 3′ and/or 5′ overhang on each strand having a length of about0, 1, 2, 3, 4, or 5 nucleotides. The length of the overhang isindependent between the two strands, i. e., the length of the overhangon one strand is not dependent on the length of the overhang on thesecond strand. Preferably the siRNA is capable of promoting RNAinterference through degradation or specific post-transcriptional genesilencing (PTGS) of the target messenger RNA (mRNA).

siRNAs also include small hairpin (also called stem loop) RNAs (shRNAs).In one embodiment, these shRNAs are composed of a short (e.g., about 19to about 25 nucleotide) antisense strand, followed by a nucleotide loopof about 5 to about 9 nucleotides, and the analogous sense strand.Alternatively, the sense strand may precede the nucleotide loopstructure and the antisense strand may follow. These shRNAs may becontained in plasmids, retroviruses, and lentiviruses and expressedfrom, for example, the pol III U6 promoter, or another promoter (see,e.g., Stewart, et al. (2003) RNA April;9(4):493-501, incorporated byreference herein in its entirety). The target gene or sequence of theRNA interfering agent may be a cellular gene or genomic sequence, e.g.the TET1 sequence. An siRNA may be substantially homologous to thetarget gene or genomic sequence, or a fragment thereof. As used in thiscontext, the term “homologous” is defined as being substantiallyidentical, sufficiently complementary, or similar to the target mRNA, ora fragment thereof, to effect RNA interference of the target. Inaddition to native RNA molecules, RNA suitable for inhibiting orinterfering with the expression of a target sequence include RNAderivatives and analogs. Preferably, the siRNA is identical to itstarget. The siRNA preferably targets only one sequence. Each of the RNAinterfering agents, such as siRNAs, can be screened for potentialoff-target effects by, for example, expression profiling. Such methodsare known to one skilled in the art and are described, for example, inJackson et al. Nature Biotechnology 6:635-637, 2003.

In addition to expression profiling, one may also screen the targetsequences for similar sequences in the sequence databases to identifysequences that may have off-target effects. For example, according toJackson et al. (Id.) 15, or perhaps as few as 11 contiguous nucleotides,of sequence identity are sufficient to direct silencing of non-targetedtranscripts. Therefore, one may initially screen the proposed siRNAs toavoid potential off-target silencing using the sequence identityanalysis by any known sequence comparison methods, such as BLAST. siRNAsequences are chosen to maximize the uptake of the antisense (guide)strand of the siRNA into RISC and thereby maximize the ability of RISCto target human GGT mRNA for degradation. This can be accomplished byscanning for sequences that have the lowest free energy of binding atthe 5′-terminus of the antisense strand. The lower free energy leads toan enhancement of the unwinding of the 5′-end of the antisense strand ofthe siRNA duplex, thereby ensuring that the antisense strand will betaken up by RISC and direct the sequence-specific cleavage of the, forexample, TET1 mRNA.

siRNA molecules need not be limited to those molecules containing onlyRNA, but, for example, further encompasses chemically modifiednucleotides and non-nucleotides, and also include molecules wherein aribose sugar molecule is substituted for another sugar molecule or amolecule which performs a similar function. Moreover, a non-naturallinkage between nucleotide residues can be used, such as aphosphorothioate linkage. The RNA strand can be derivatized with areactive functional group of a reporter group, such as a fluorophore.Particularly useful derivatives are modified at a terminus or termini ofan RNA strand, typically the 3′ terminus of the sense strand. Forexample, the 2′-hydroxyl at the 3′ terminus can be readily andselectively derivatizes with a variety of groups.

Other useful RNA derivatives incorporate nucleotides having modifiedcarbohydrate moieties, such as 2′O-alkylated residues or 2′-O-methylribosyl derivatives and 2′-O-fluoro ribosyl derivatives. The RNA basesmay also be modified. Any modified base useful for inhibiting orinterfering with the expression of a target sequence may be used. Forexample, halogenated bases, such as 5-bromouracil and 5-iodouracil canbe incorporated. The bases may also be alkylated, for example,7-methylguanosine can be incorporated in place of a guanosine residue.Non-natural bases that yield successful inhibition can also beincorporated. The most preferred siRNA modifications include2′-deoxy-2′-fluorouridine or locked nucleic acid (LAN) nucleotides andRNA duplexes containing either phosphodiester or varying numbers ofphosphorothioate linkages. Such modifications are known to one skilledin the art and are described, for example, in Braasch et al.,Biochemistry, 42: 7967-7975, 2003. Most of the useful modifications tothe siRNA molecules can be introduced using chemistries established forantisense oligonucleotide technology. Preferably, the modificationsinvolve minimal 2′-O-methyl modification, preferably excluding suchmodification. Modifications also preferably exclude modifications of thefree 5′-hydroxyl groups of the siRNA. The Examples herein providespecific examples of RNA interfering agents, such as RNAi molecules thateffectively target mRNA of a TET family enzyme. In some embodiments ofthe aspects described herein, examples of siRNA and shRNA sequences thatcan be used to inhibit TET family activity include, but are not limitedto: SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 40, SEQ ID NO: 41, SEQ IDNO: 48, SEQ ID NO: 49, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NO: 70, SEQID NO: 74, SEQ ID NO: 75, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 82,SEQ ID NO: 83, SEQ ID NO: 86, SEQ ID NO: 98, and SEQ ID NO: 92.

siRNAs useful for targeting expression of a TET family enzyme can bereadily designed and tested. Chalk et al. (Nucl. Acids Res. 33:D131-D134 (2005)) describe a database of siRNA sequences and a predictorof siRNA sequences. Linked to the sequences in the database isinformation such as siRNA thermodynamic properties and the potential forsequence-specific off-target effects. The database and associatedpredictive tools enable the user to evaluate an siRNA's potential forinhibition and non-specific effects. The database is available at on theworld wide web at siRNA.cgb.ki.se. Synthetic siRNA molecules, includingshRNA molecules, can be obtained using a number of techniques known tothose of skill in the art. For example, the siRNA molecule can bechemically synthesized or recombinantly produced using methods known inthe art, such as using appropriately protected ribonucleosidephosphoramidites and a conventional DNA/RNA synthesizer (see, e.g.,Elbashir, S. M. et al., Nature 411:494-498 (2001); Elbashir, S. M., etal., Genes & Development 15:188-200 (2001); Harborth, J. et al., J. CellScience 114:4557-4565 (2001); Masters, J. R. et al., Proc. Natl. Acad.Sci., USA 98:8012-8017 (2001); and Tuschl, T. et al., Genes &Development 13:3191-3197 (1999)).

Alternatively, several commercial RNA synthesis suppliers are availableincluding, but not limited to, Proligo (Hamburg, Germany), DharmaconResearch (Lafayette, Colo., USA), Pierce Chemical (part of PerbioScience, Rockford, Ill., USA), Glen Research (Sterling, Va., USA),ChemGenes (Ashland, Mass., USA), and Cruachem (Glasgow, UK). As such,siRNA molecules are not overly difficult to synthesize and are readilyprovided in a quality suitable for RNAi. In addition, dsRNAs can beexpressed as stem loop structures encoded by plasmid vectors,retroviruses and lentiviruses (Paddison, P. J. et al., Genes Dev.16:948-958 (2002); McManus, M. T. et al., RNA 8:842-850 (2002); Paul, C.P. et al., Nat. Biotechnol. 20:505-508 (2002); Miyagishi, M. et al.,Nat. Biotechnol. 20:497-500 (2002); Sui, G. et al., Proc. Natl. Acad.Sci., USA 99:5515-5520 (2002); Brummelkamp, T. et al., Cancer Cell 2:243(2002); Lee, N. S., et al., Nat. Biotechnol. 20:500-505 (2002); Yu, J.Y., et al., Proc. Natl. Acad. Sci., USA 99:6047-6052 (2002); Zeng, Y.,et al., Mol. Cell 9:1327-1333 (2002); Rubinson, D. A., et al., Nat.Genet. 33:401-406 (2003); Stewart, S. A., et al., RNA 9:493-501 (2003)).

In one embodiment, the RNA interference agent is delivered oradministered in a pharmaceutically acceptable carrier. Additionalcarrier agents, such as liposomes, can be added to the pharmaceuticallyacceptable carrier. In another embodiment, the RNA interference agent isdelivered by a vector encoding small hairpin RNA (shRNA) in apharmaceutically acceptable carrier to the cells in an organ of anindividual. The shRNA is converted by the cells after transcription intosiRNA capable of targeting, for example, a TET family enzyme.

In one embodiment, the vector is a regulatable vector, such astetracycline inducible vector. Methods described, for example, in Wanget al. Proc. Natl. Acad. Sci. 100: 5103-5106, using pTet-On vectors (BDBiosciences Clontech, Palo Alto, Calif.) can be used. In one embodiment,the RNA interference agents used in the methods described herein aretaken up actively by cells in vivo following intravenous injection,e.g., hydrodynamic injection, without the use of a vector, illustratingefficient in vivo delivery of the RNA interfering agents. One method todeliver the siRNAs is catheterization of the blood supply vessel of thetarget organ. Other strategies for delivery of the RNA interferenceagents, e.g., the siRNAs or shRNAs used in the methods of the invention,may also be employed, such as, for example, delivery by a vector, e.g.,a plasmid or viral vector, e.g., a lentiviral vector. Such vectors canbe used as described, for example, in Xiao-Feng Qin et al. Proc. Natl.Acad. Sci. U.S.A., 100: 183-188. Other delivery methods include deliveryof the RNA interfering agents, e.g., the siRNAs or shRNAs of theinvention, using a basic peptide by conjugating or mixing the RNAinterfering agent with a basic peptide, e.g., a fragment of a TATpeptide, mixing with cationic lipids or formulating into particles.

The RNA interference agents, e.g., the siRNAs targeting TET familyenzyme mRNA, may be delivered singly, or in combination with other RNAinterference agents, e.g., siRNAs, such as, for example siRNAs directedto other cellular genes. TET family enzyme siRNAs may also beadministered in combination with other pharmaceutical agents which areused to treat or prevent diseases or disorders, as described herein.

Synthetic siRNA molecules, including shRNA molecules, can be obtainedusing a number of techniques known to those of skill in the art. Forexample, the siRNA molecule can be chemically synthesized orrecombinantly produced using methods known in the art, such as usingappropriately protected ribonucleoside phosphoramidites and aconventional DNA/RNA synthesizer (see, e.g., Elbashir, S. M. et al.(2001) Nature 411:494-498; Elbashir, S. M., W. Lendeckel and T. Tuschl(2001) Genes & Development 15:188-200; Harborth, J. et al. (2001) J.Cell Science 114:4557-4565; Masters, J. R. et al. (2001) Proc. Natl.Acad. Sci., USA 98:8012-8017; and Tuschl, T. et al. (1999) Genes &Development 13:3191-3197). Alternatively, several commercial RNAsynthesis suppliers are available including, but not limited to, Proligo(Hamburg, Germany), Dharmacon Research (Lafayette, Colo., USA), PierceChemical (part of Perbio Science, Rockford, Ill., USA), Glen Research(Sterling, Va., USA), ChemGenes (Ashland, Mass., USA), and Cruachem(Glasgow, UK). As such, siRNA molecules are not overly difficult tosynthesize and are readily provided in a quality suitable for RNAi. Inaddition, dsRNAs can be expressed as stem loop structures encoded byplasmid vectors, retroviruses and lentiviruses (Paddison, P. J. et al.(2002) Genes Dev. 16:948-958; McManus, M. T. et al. (2002) RNA8:842-850; Paul, C. P. et al. (2002) Nat. Biotechnol. 20:505-508;Miyagishi, M. et al. (2002) Nat. Biotechnol. 20:497-500; Sui, G. et al.(2002) Proc. Natl. Acad. Sci., USA 99:5515-5520; Brummelkamp, T. et al.(2002) Cancer Cell 2:243; Lee, N. S., et al. (2002) Nat. Biotechnol.20:500-505; Yu, J. Y., et al. (2002) Proc. Natl. Acad. Sci., USA99:6047-6052; Zeng, Y., et al. (2002) Mol. Cell 9:1327-1333; Rubinson,D. A., et al. (2003) Nat. Genet. 33:401-406; Stewart, S. A., et al.(2003) RNA 9:493-501). These vectors generally have a polIII promoterupstream of the dsRNA and can express sense and antisense RNA strandsseparately and/or as a hairpin structures. Within cells, Dicer processesthe short hairpin RNA (shRNA) into effective siRNA. The targeted regionof the siRNA molecule of the present invention can be selected from agiven target gene sequence, e.g., a TET family enzyme coding sequence,beginning from about 25 to 50 nucleotides, from about 50 to 75nucleotides, or from about 75 to 100 nucleotides downstream of the startcodon. Nucleotide sequences may contain 5′ or 3′ UTRs and regions nearbythe start codon. One method of designing a siRNA molecule of the presentinvention involves identifying the 23 nucleotide sequence motifAA(N19)TT (SEQ ID NO: 102) (where N can be any nucleotide) and selectinghits with at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70% or75% G/C content. The “TT” portion of the sequence is optional.Alternatively, if no such sequence is found, the search may be extendedusing the motif NA(N21), where N can be any nucleotide. In thissituation, the 3′ end of the sense siRNA may be converted to TT to allowfor the generation of a symmetric duplex with respect to the sequencecomposition of the sense and antisense 3′ overhangs. The antisense siRNAmolecule may then be synthesized as the complement to nucleotidepositions 1 to 21 of the 23 nucleotide sequence motif. The use ofsymmetric 3′ TT overhangs may be advantageous to ensure that the smallinterfering ribonucleoprotein particles (siRNPs) are formed withapproximately equal ratios of sense and antisense target RNA-cleavingsiRNPs (Elbashir et al. (2001) supra and Elbashir et al. 2001 supra).Analysis of sequence databases, including but not limited to the NCBI,BLAST, Derwent and GenSeq as well as commercially availableoligosynthesis companies such as Oligoengine®, may also be used toselect siRNA sequences against EST libraries to ensure that only onegene is targeted.

Delivery of RNA Interfering Agents:

In general, any method of delivering a nucleic acid molecule can beadapted for use with an RNAi interference molecule (see e.g., Akhtar S.and Julian R L. (1992) Trends Cell. Biol. 2(5):139-144; WO94/02595,which are incorporated herein by reference in their entirety). Methodsof delivering RNA interference agents, e.g., an siRNA, or vectorscontaining an RNA interference agent, to the target cells, e.g., acancer cell or other desired target cells, for uptake can includeinjection of a composition containing the RNA interference agent, e.g.,an siRNA, or directly contacting the cell, e.g., a lymphocyte, with acomposition comprising an RNA interference agent, e.g., an siRNA.

However, there are factors that are important to consider in order tosuccessfully deliver an RNAi molecule in vivo. For example, one shouldconsider: (1) biological stability of the RNAi molecule, (2) preventingnon-specific effects, and (3) accumulation of the RNAi molecule in thetarget tissue. The non-specific effects of an RNAi molecule can beminimized by local administration by e.g., direct injection into atumor, cell, target tissue, or topically. Local administration of anRNAi molecule to a treatment site limits the exposure of the e.g., siRNAto systemic tissues and permits a lower dose of the RNAi molecule to beadministered. Several studies have shown successful knockdown of geneproducts when an RNAi molecule is administered locally. For example,intraocular delivery of a VEGF siRNA by intravitreal injection incynomolgus monkeys (Tolentino, M J., et al (2004) Retina 24:132-138) andsubretinal injections in mice (Reich, S J., et al (2003) Mol. Vis.9:210-216) were both shown to prevent neovascularization in anexperimental model of age-related macular degeneration. In addition,direct intratumoral injection of an siRNA in mice reduces tumor volume(Pille, J., et al (2005) Mol. Ther. 11:267-274) and can prolong survivalof tumor-bearing mice (Kim, W J., et al (2006) Mol. Ther. 14:343-350;Li, S., et al (2007) Mol. Ther. 15:515-523). RNA interference has alsoshown success with local delivery to the CNS by direct injection (Dorn,G., et al. (2004) Nucleic Acids 32:e49; Tan, P H., et al (2005) GeneTher. 12:59-66; Makimura, H., et al (2002) BMC Neurosci. 3:18;Shishkina, G T., et al (2004) Neuroscience 129:521-528; Thakker, E R.,et al (2004) Proc. Natl. Acad. Sci. U.S.A. 101:17270-17275; Akaneya,Y.,et al (2005) J. Neurophysiol. 93:594-602) and to the lungs by intranasaladministration (Howard, K A., et al (2006) Mol. Ther. 14:476-484; Zhang,X., et al (2004) J. Biol. Chem. 279:10677-10684; Bitko, V., et al (2005)Nat. Med. 11:50-55).

For administering an RNAi molecule systemically for the treatment of adisease, the RNAi molecule can be either be modified or alternativelydelivered using a drug delivery system-both methods act to prevent therapid degradation of the RNAi molecule by endo- and exo-nucleases invivo. Modification of the RNAi molecule or the pharmaceutical carriercan also permit targeting of the RNAi molecule to the target tissue andavoid undesirable off-target effects.

RNA interference molecules can be modified by chemical conjugation tolipophilic groups such as cholesterol to enhance cellular uptake andprevent degradation. For example, an siRNA directed against ApoBconjugated to a lipophilic cholesterol moiety was injected systemicallyinto mice and resulted in knockdown of apoB mRNA in both the liver andjejunum (Soutschek, J., et al (2004) Nature 432:173-178). Conjugation ofan RNAi molecule to an aptamer has been shown to inhibit tumor growthand mediate tumor regression in a mouse model of prostate cancer(McNamara, J O., et al (2006) Nat. Biotechnol. 24:1005-1015).

In an alternative embodiment, the RNAi molecules can be delivered usingdrug delivery systems such as e.g., a nanoparticle, a dendrimer, apolymer, liposomal, or a cationic delivery system. Positively chargedcationic delivery systems facilitate binding of an RNA interferencemolecule (negatively charged) and also enhance interactions at thenegatively charged cell membrane to permit efficient uptake of an siRNAby the cell. Cationic lipids, dendrimers, or polymers can either bebound to an RNA interference molecule, or induced to form a vesicle ormicelle (see e.g., Kim S H., et al (2008) Journal of Controlled Release129(2):107-116) that encases an RNAi molecule. The formation of vesiclesor micelles further prevents degradation of the RNAi molecule whenadministered systemically. Methods for making and administeringcationic-RNAi complexes are well within the abilities of one skilled inthe art (see e.g., Sorensen, D R., et al (2003) J. Mol. Biol327:761-766; Verma, U N., et al (2003) Clin. Cancer Res. 9:1291-1300;Arnold, A S et al (2007) J. Hypertens. 25:197-205).

Some non-limiting examples of drug delivery systems useful for systemicadministration of RNAi include DOTAP (Sorensen, D R., et al (2003),supra; Verma, U N., et al (2003), supra), Oligofectamine, “solid nucleicacid lipid particles” (Zimmermann, T S., et al (2006) Nature441:111-114), cardiolipin (Chien, P Y., et al (2005) Cancer Gene Ther.12:321-328; Pal, A., et al (2005) Int J. Oncol. 26:1087-1091),polyethyleneimine (Bonnet M E., et al (2008) Pharm. Res. August 16 Epubahead of print; Aigner, A. (2006) J. Biomed. Biotechnol. 71659),Arg-Gly-Asp (RGD) peptides (Liu, S. (2006) Mol. Pharm. 3:472-487), andpolyamidoamines (Tomalia, D A., et al (2007) Biochem. Soc. Trans.35:61-67; Yoo, H., et al (1999) Pharm. Res. 16:1799-1804). In someembodiments, an RNAi molecule forms a complex with cyclodextrin forsystemic administration. Methods for administration and pharmaceuticalcompositions of RNAi molecules and cyclodextrins can be found in U.S.Pat. No. 7,427,605, which is herein incorporated by reference in itsentirety. Specific methods for administering an RNAi molecule for theinhibition of angiogenesis can be found in e.g., U.S. Patent ApplicationNo. 20080152654.

In other embodiments, RNA interference agent, e.g., an siRNA may beinjected directly into any blood vessel, such as vein, artery, venule orarteriole, via, e.g., hydrodynamic injection or catheterization.Administration may be by a single injection or by two or moreinjections. The RNA interference agent is delivered in apharmaceutically acceptable carrier. One or more RNA interference agentsmay be used simultaneously. In one embodiment, only one siRNA thattargets a human TET family enzyme is used. In one embodiment, specificcells are targeted with RNA interference, limiting potential sideeffects of RNA interference caused by non-specific targeting of RNAinterference. The method can use, for example, a complex or a fusionmolecule comprising a cell targeting moiety and an RNA interferencebinding moiety that is used to deliver RNA interference effectively intocells. For example, an antibody-protamine fusion protein when mixed withsiRNA, binds siRNA and selectively delivers the siRNA into cellsexpressing an antigen recognized by the antibody, resulting in silencingof gene expression only in those cells that express the antigen. ThesiRNA or RNA interference-inducing molecule binding moiety is a proteinor a nucleic acid binding domain or fragment of a protein, and thebinding moiety is fused to a portion of the targeting moiety. Thelocation of the targeting moiety can be either in the carboxyl-terminalor amino-terminal end of the construct or in the middle of the fusionprotein. A viral-mediated delivery mechanism can also be employed todeliver siRNAs to cells in vitro and in vivo as described in Xia, H. etal. (2002) Nat Biotechnol 20(10):1006). Plasmid- or viral-mediateddelivery mechanisms of shRNA may also be employed to deliver shRNAs tocells in vitro and in vivo as described in Rubinson, D. A., et al.((2003) Nat. Genet. 33:401-406) and Stewart, S. A., et al. ((2003) RNA9:493-501). The RNA interference agents, e.g., the siRNAs or shRNAs, canbe introduced along with components that perform one or more of thefollowing activities: enhance uptake of the RNA interfering agents,e.g., siRNA, by the cell, e.g., lymphocytes or other cells, inhibitannealing of single strands, stabilize single strands, or otherwisefacilitate delivery to the target cell and increase inhibition of thetarget gene, e.g., TET1, TET2, TET3, or CXXC4. The dose of theparticular RNA interfering agent will be in an amount necessary toeffect RNA interference, e.g., post translational gene silencing (PTGS),of the particular target gene, thereby leading to inhibition of targetgene expression or inhibition of activity or level of the proteinencoded by the target gene.

Small Molecule Inhibitors and Activators:

As used herein, the term “small molecule” refers to a chemical agentincluding, but not limited to, peptides, peptidomimetics, amino acids,amino acid analogs, polynucleotides, polynucleotide analogs, aptamers,nucleotides, nucleotide analogs, organic or inorganic compounds (i.e.,including heteroorganic and organometallic compounds) having a molecularweight less than about 10,000 grams per mole, organic or inorganiccompounds having a molecular weight less than about 5,000 grams permole, organic or inorganic compounds having a molecular weight less thanabout 1,000 grams per mole, organic or inorganic compounds having amolecular weight less than about 500 grams per mole, and salts, esters,and other pharmaceutically acceptable forms of such compounds.

Antibodies Specific for TET Family Enzymes and Detecting TET FamilyActivity

Antibodies that can be used according to the methods described herein,for example, for detecting TET family activity, such ashydroxymethylation of cytosine, include complete immunoglobulins,antigen binding fragments of immunoglobulins, as well as antigen bindingproteins that comprise antigen binding domains of immunoglobulins.Antigen binding fragments of immunoglobulins include, for example, Fab,Fab′, F(ab′)2, scFv and dAbs. Modified antibody formats have beendeveloped which retain binding specificity, but have othercharacteristics that may be desirable, including for example,bispecificity, multivalence (more than two binding sites), and compactsize (e.g., binding domains alone). Single chain antibodies lack some orall of the constant domains of the whole antibodies from which they arederived. Therefore, they can overcome some of the problems associatedwith the use of whole antibodies. For example, single-chain antibodiestend to be free of certain undesired interactions between heavy-chainconstant regions and other biological molecules. Additionally,single-chain antibodies are considerably smaller than whole antibodiesand can have greater permeability than whole antibodies, allowingsingle-chain antibodies to localize and bind to target antigen-bindingsites more efficiently. Furthermore, the relatively small size ofsingle-chain antibodies makes them less likely to provoke an unwantedimmune response in a recipient than whole antibodies.

Multiple single chain antibodies, each single chain having one VH andone VL domain covalently linked by a first peptide linker, can becovalently linked by at least one or more peptide linker to formmultivalent single chain antibodies, which can be monospecific ormultispecific. Each chain of a multivalent single chain antibodyincludes a variable light chain fragment and a variable heavy chainfragment, and is linked by a peptide linker to at least one other chain.The peptide linker is composed of at least fifteen amino acid residues.The maximum number of linker amino acid residues is approximately onehundred.

Two single chain antibodies can be combined to form a diabody, alsoknown as a bivalent dimer. Diabodies have two chains and two bindingsites, and can be monospecific or bispecific. Each chain of the diabodyincludes a VH domain connected to a VL domain. The domains are connectedwith linkers that are short enough to prevent pairing between domains onthe same chain, thus driving the pairing between complementary domainson different chains to recreate the two antigen-binding sites.

Three single chain antibodies can be combined to form triabodies, alsoknown as trivalent trimers. Triabodies are constructed with the aminoacid terminus of a VL or VH domain directly fused to the carboxylterminus of a VL or VH domain, i.e., without any linker sequence. Thetriabody has three Fv heads with the polypeptides arranged in a cyclic,head-to-tail fashion. A possible conformation of the triabody is planarwith the three binding sites located in a plane at an angle of 120degrees from one another. Triabodies can be monospecific, bispecific ortrispecific.

Thus, antibodies useful in the methods described herein include, but arenot limited to, naturally occurring antibodies, bivalent fragments suchas (Fab′)2, monovalent fragments such as Fab, single chain antibodies,single chain Fv (scFv), single domain antibodies, multivalent singlechain antibodies, diabodies, triabodies, and the like that bindspecifically with an antigen.

Antibodies can also be raised against a nucleotide, polypeptide orportion of a polypeptide by methods known to those skilled in the art.Antibodies are readily raised in animals such as rabbits or mice byimmunization with the gene product, or a fragment thereof. Immunizedmice are particularly useful for providing sources of B cells for themanufacture of hybridomas, which in turn are cultured to produce largequantities of monoclonal antibodies. Antibody manufacture methods aredescribed in detail, for example, in Harlow et al., 1988. While bothpolyclonal and monoclonal antibodies can be used in the methodsdescribed herein, it is preferred that a monoclonal antibody is usedwhere conditions require increased specificity for a particular protein.

The term “intrabodies” as used herein, refers to a method wherein totarget intracellular endogenous proteins as described in U.S. Pat. No.6,004,940. Briefly, the method comprises the intracellular expression ofan antibody capable of binding to the target. A DNA sequence isdelivered to a cell, the DNA sequence contains a sufficient number ofnucleotides coding for the portion of an antibody capable of binding tothe target operably linked to a promoter that will permit expression ofthe antibody in the cell(s) of interest. The antibody is then expressedintracellularly and binds to the target, thereby disrupting the targetfrom its normal actions.

The terms “label” or “tag”, as used herein, refer to a compositioncapable of producing a detectable signal indicative of the presence ofthe target, such as, for example, a 5-hydroxymethylcytosine, in an assaysample. Suitable labels include radioisotopes, nucleotide chromophores,enzymes, substrates, fluorescent molecules, chemiluminescent moieties,magnetic particles, bioluminescent moieties, and the like. As such, alabel is any composition detectable by spectroscopic, photochemical,biochemical, immunochemical, electrical, optical or chemical means. Theterms “labeled antibody” or “tagged antibody”, as used herein, includesantibodies that are labeled by a detectable means and include, but arenot limited to, antibodies that are enzymatically, radioactively,fluorescently, and chemiluminescently labeled. Antibodies can also belabeled with a detectable tag, such as c-Myc, HA, VSV-G, HSV, FLAG, V5,or HIS. The detection and quantification of, for example,5-hydroxymethylcytosine residues present in a nucleic acid samplecorrelate to the intensity of the signal emitted from the detectablylabeled antibody. In one embodiment, the label is a detectable marker,e.g., incorporation of a radiolabeled amino acid. Various methods oflabeling polypeptides and glycoproteins are known in the art and may beused.

Examples of labels or tags for polypeptides include, but are not limitedto, the following: radioisotopes or radionuclides (e.g., 3H, 14C, 15N,35S, 43K, 52Fe, 57Co, 67Cu, 67Ga, 68 Ga, 90Y, 99Tc, 111In, 1231, 1251,1311, or 1321), fluorescent labels (e.g., FITC, phycoerythrin,rhodamine, lanthanide phosphors), enzymatic labels (e.g., horseradishperoxidase, beta-galactosidase, luciferase, alkaline phosphatase),quantum dots, chemiluminescent markers, biotinyl groups, predeterminedpolypeptide epitopes recognized by a secondary reporter (e.g., leucinezipper pair sequences, binding sites for secondary antibodies, metalbinding domains, epitope tags), magnetic agents, such as gadoliniumchelates, toxins such as pertussis toxin, taxol, cytochalasin B,gramicidin D, ethidium bromide, emetine, mitomycin, etoposide,tenoposide, vincristine, vinblastine, colchicin, doxorubicin,daunorubicin, dihydroxy anthracin dione, mitoxantrone, mithramycin,actinomycin D, 1-dehydrotestosterone, glucocorticoids, procaine,tetracaine, lidocaine, propranolol, and puromycin and analogs orhomologs thereof. In some embodiments, the label for the antibody is afluorescent label.

A fluorescent label or tag for labeling the antibody may beHydroxycoumarin, Succinimidyl ester, Aminocoumarin, Succinimidyl ester,Methoxycoumarin, Succinimidyl ester, Cascade Blue, Hydrazide, PacificBlue, Maleimide, Pacific Orange, Lucifer yellow, NBD, NBD-X,R-Phycoerythrin (PE), a PE-Cy5 conjugate (Cychrome, R670, Tri-Color,Quantum Red), a PE-Cy7 conjugate, Red 613, PE-Texas Red, PerCP,Peridinin chlorphyll protein, TruRed (PerCP-Cy5.5 conjugate), FluorX,Fluoresceinisothyocyanate (FITC), BODIPY-FL, TRITC, X-Rhodamine (XRITC),Lissamine Rhodamine B, Texas Red, Allophycocyanin (APC), an APC-Cy7conjugate, Alexa Fluor 350, Alexa Fluor 405, Alexa Fluor 430, AlexaFluor 488, Alexa Fluor 500, Alexa Fluor 514, Alexa Fluor 532, AlexaFluor 546, Alexa Fluor 555, Alexa Fluor 568, Alexa Fluor 594, AlexaFluor 610, Alexa Fluor 633, Alexa Fluor 647, Alexa Fluor 660, AlexaFluor 680, Alexa Fluor 700, Alexa Fluor 750, Alexa Fluor 790, Cy2, Cy3,Cy3B, Cy3.5, Cy5, Cy5.5 or Cy7.

As used herein, the term “vector” refers to a nucleic acid moleculecapable of transporting another nucleic acid to which it has beenlinked. One type of vector is a “plasmid”, which refers to a circulardouble stranded DNA loop into which additional nucleic acid segments canbe ligated. Another type of vector is a viral vector, wherein additionalnucleic acid segments can be ligated into the viral genome. Certainvectors are capable of autonomous replication in a host cell into whichthey are introduced (e.g., bacterial vectors having a bacterial originof replication and episomal mammalian vectors). Other vectors (e.g.,non-episomal mammalian vectors) are integrated into the genome of a hostcell upon introduction into the host cell, and thereby are replicatedalong with the host genome. Moreover, certain vectors are capable ofdirecting the expression of genes to which they are operatively linked.Such vectors are referred to herein as “recombinant expression vectors”,or more simply “expression vectors.” In general, expression vectors ofutility in recombinant DNA techniques are often in the form of plasmids.In the present specification, “plasmid” and “vector” can be usedinterchangeably as the plasmid is the most commonly used form of vector.However, the invention is intended to include such other forms ofexpression vectors, such as viral vectors (e.g., non-integrating viralvectors or replication defective retroviruses, lentiviruses,adenoviruses and adeno-associated viruses), which serve equivalentfunctions. In one embodiment, lentiviruses are used to deliver one ormore siRNA molecule of the present invention to a cell.

As used herein, the term “non-integrating viral vector” refers to aviral vector that does not integrate into the host genome; theexpression of the gene delivered by the viral vector is temporary. Sincethere is little to no integration into the host genome, non-integratingviral vectors have the advantage of not producing DNA mutations byinserting at a random point in the genome. For example, anon-integrating viral vector remains extra-chromosomal and does notinsert its genes into the host genome, potentially disrupting theexpression of endogenous genes. Non-integrating viral vectors caninclude, but are not limited to, the following: adenovirus, alphavirus,picornavirus, and vaccinia virus. These viral vectors are“non-integrating” viral vectors as the term is used herein, despite thepossibility that any of them may, in some rare circumstances, integrateviral nucleic acid into a host cell's genome. What is critical is thatthe viral vectors used in the methods described herein do not, as a ruleor as a primary part of their life cycle under the conditions employed,integrate their nucleic acid into a host cell's genome. It goes withoutsaying that an iPS cell generated by a non-integrating viral vector willnot be administered to a subject unless it and its progeny are free fromviral remnants.

As used herein, the term “viral remnants” refers to any viral protein ornucleic acid sequence introduced using a viral vector. Generally,integrating viral vectors will incorporate their sequence into thegenome; such sequences are referred to herein as a “viral integrationremnant”. However, the temporary nature of a non-integrating virus meansthat the expression, and presence of, the virus is temporary and is notpassed to daughter cells. Thus, upon passaging of a re-programmed cellthe viral remnants of the non-integrating virus are essentially removed.

As used herein, the phrases “free of viral integration remnants” and“substantially free of viral integration remnants” refers to iPS cellsthat do not have detectable levels of an integrated adenoviral genome oran adenoviral specific protein product (i.e., a product other than thegene of interest), as assayed by PCR or immunoassay. Thus, the iPS cellsthat are free (or substantially free) of viral remnants have beencultured for a sufficient period of time that transient expression ofthe adenoviral vector leaves the cells substantially free of viralremnants.

Within an expression vector, “operably linked” is intended to mean thatthe nucleotide sequence of interest is linked to the regulatorysequence(s) in a manner which allows for expression of the nucleotidesequence (e.g., in an in vitro transcription/translation system or in atarget cell when the vector is introduced into the target cell). Theterm “regulatory sequence” is intended to include promoters, enhancersand other expression control elements (e.g., polyadenylation signals).Such regulatory sequences are described, for example, in Goeddel; GeneExpression Technology: Methods in Enzymology 185, Academic Press, SanDiego, Calif. (1990). Regulatory sequences include those which directconstitutive expression of a nucleotide sequence in many types of hostcell and those which direct expression of the nucleotide sequence onlyin certain host cells (e.g., tissue-specific regulatory sequences).Furthermore, the RNA interfering agents may be delivered by way of avector comprising a regulatory sequence to direct synthesis of thesiRNAs of the invention at specific intervals, or over a specific timeperiod. It will be appreciated by those skilled in the art that thedesign of the expression vector can depend on such factors as the choiceof the target cell, the level of expression of siRNA desired, and thelike.

The expression vectors of the invention can be introduced into targetcells to thereby produce siRNA molecules of the present invention. Inone embodiment, a DNA template, e.g., a DNA template encoding the siRNAmolecule directed against the mutant allele, may be ligated into anexpression vector under the control of RNA polymerase III (Pol III), anddelivered to a target cell. Pol III directs the synthesis of small,noncoding transcripts which 3′ ends are defined by termination within astretch of 4-5 thymidines. Accordingly, DNA templates may be used tosynthesize, in vivo, both sense and antisense strands of siRNAs whicheffect RNAi (Sui, et al. (2002) PNAS 99(8):5515).

As used in this specification and the appended claims, the singularforms “a,” “an,” and “the” include plural references unless the contextclearly dictates otherwise. Thus for example, references to “the method”includes one or more methods, and/or steps of the type described hereinand/or which will become apparent to those persons skilled in the artupon reading this disclosure and so forth. It is understood that theforegoing detailed description and the following examples areillustrative only and are not to be taken as limitations upon the scopeof the invention. Various changes and modifications to the disclosedembodiments, which will be apparent to those of skill in the art, may bemade without departing from the spirit and scope of the presentinvention.

As used herein, the term “comprising” or “comprises” is used inreference to compositions, methods, and respective component(s) thereof,that are essential to the invention, yet open to the inclusion ofunspecified elements, whether essential or not.

As used herein, the term “consisting essentially of” refers to thoseelements required for a given embodiment. The term permits the presenceof additional elements that do not materially affect the basic and novelor functional characteristic(s) of that embodiment of the invention.

As used herein, the term “consisting of” refers to compositions,methods, and respective components thereof as described herein, whichare exclusive of any element not recited in that description of theembodiment.

All patents, patent applications, and publications identified areexpressly incorporated herein by reference for the purpose of describingand disclosing, for example, the methodologies described in suchpublications that might be used in connection with the presentinvention. These publications are provided solely for their disclosureprior to the filing date of the present application. Nothing in thisregard should be construed as an admission that the inventors are notentitled to antedate such disclosure by virtue of prior invention or forany other reason. All statements as to the date or representation as tothe contents of these documents are based on the information availableto the applicants and do not constitute any admission as to thecorrectness of the dates or contents of these documents.

Examples DNA Methylation and Demethylation

DNA methylation and demethylation play a vital role in mammaliandevelopment. In mammals, DNA methylation occurs primarily on cytosine inthe context of the dinucleotide CpG. DNA methylation is dynamic duringearly embryogenesis and has a crucial role in parental imprinting,X-inactivation and silencing of endogenous retroviruses. Embryonicdevelopment is accompanied by remarkable changes in the methylationstatus of individual genes, whole chromosomes and, at times, the entiregenome (A. Bird, Genes Dev 16: 6-21 (2002); W. Reik, Nature 447: 425-432(2007); K. Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell128: 747-762 (2007); J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22(2006)). There is active genome-wide demethylation of the paternalgenome shortly after fertilization (W. Mayer, Nature 403: 501-502(2000); J. Oswald, Curr Biol 10: 475-478 (2000)). DNA demethylation isalso an important mechanism by which germ cells are reprogrammed: thedevelopment of primordial germ cells (PGC) during early embryogenesisinvolves widespread DNA demethylation that may be mediated by an active(i.e. replication-independent) mechanism (A. Bird, Genes Dev 16: 6-21(2002); W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger, Nature441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); J. B.Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006); W. Mayer, Nature 403:501-502 (2000); J. Oswald, Curr Biol 10: 475-478 (2000)).

De novo DNA methylation and demethylation are also prominent in somaticcells during differentiation, tumorigenesis and aging. Expression ofdifferentiation-specific genes in somatic cells is often accompanied byprogressive DNA demethylation (W. Reik, Nature 447: 425-432 (2007); K.Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128:747-762 (2007)), but it is not clear whether this process reflects an“active” process (see below) or “passive” demethylation occurring as aresult of exclusion of Dnmt1 during replication. In cultured breastcancer cells, gene expression in response to oestrogen has been shown tobe accompanied by waves of apparent DNA demethylation and remethylationthat are clearly not coupled to replication (H. Cedar, Nature 397:568-569 (1999); S. K. Ooi, Cell 133:1145-1148 (2008)). Moreover, tightregulation of DNA demethylation is a likely feature of pluripotent stemcells and progenitor cells in cellular differentiation pathways, thatcould plausibly contribute to the ability of these cells to self-renewas well as to give rise to daughter differentiating cells. In fact, ithas been proposed that pluripotency and the ability to self-renew, twoimportant aspects of stem cell function, involve (or require) proper DNAdemethylation (W. Reik, Nature 447: 425-432 (2007); K. Hochedlinger,Nature 441: 1061-1067 (2006); M. A. Surani Cell 128: 747-762 (2007); S.Simonsson, Nat. Cell. Biol. 6: 984-990 (2004); R.Blelloch, Stem Cells24: 2007-2013 (2006)) and as such, could be improved by controlledexpression of enzymes in the DNA demethylation pathway. Furthermore, DNAmethylation is highly aberrant in cancer, with global loss ofmethylation as well as increased methylation leading to silencing oftumor suppressor genes (L. T. Smith, Trends Genet 23: 449-456 (2007); E.N. Gal-Yam, Annu Rev Med 59: 267-280 (2008); M. Esteller Nature RevCancer; 8: 286-298 (2007); M. Esteller, N Engl J Med, 358: 1148-1159(2008)), thus it seems possible that cancer cells aberrantly turn on theDNA demethylation pathway, and that the self-renewing population ofcancer stem cells is characterized by high levels of DNA demethylaseactivity. Overall, therefore, an understanding of the mechanism ofactive DNA demethylation has broad implications for our understanding ofmammalian development, cell differentiation, cancer, stem cell functionand aging.

DNA demethylation can proceed by two possible mechanisms—“passive”replication-dependent demethylation and a postulated process of activedemethylation for which the molecular basis is still unknown (seebelow). The passive mechanism is fairly well understood. Normally,cytosine methylation in CpG dinucleotides is symmetric, i.e. occurs onboth strands. Hemimethylated CpG's, which are generated duringreplication of symmetrically-methylated DNA, are recognized by DNAmethyltransferase (Dnmt) 1 and are rapidly remethylated. This process isfacilitated by interaction of Dnmt1 with proliferating cell nuclearantigen PCNA, which targets Dnmt1 to the replication fork and ensuresrapid restoration of the symmetrical pattern of DNA methylation (H.Leonhardt, 1: 865-873, (1992), L. S. Chuang, Science, 277: 1996-2000(1997).

If Dnmt1 activity is inhibited or Dnmt1 is excluded from the replicationfork for any reason, remethylation of the CpG on the opposite stranddoes not occur and only one of the two daughter strands retains cytosinemethylation. “Passive” demethylation is typically observed during celldifferentiation, where it accompanies the increased expression oflineage-specific genes (D.U. Lee, Immunity, 16: 649-660 (2002)). Over aprolonged time period (3-7 cycles of DNA replication), cytosinemethylation is progressively lost from genes whose expression increasesas a result of cell differentiation.

So far, enzymes with the ability to demethylate DNA by an activemechanism have not been identified as molecular entities. There isevidence that active DNA demethylation occurs in certaincarefully-controlled circumstances: for instance, the paternal genome isactively demethylated shortly after fertilization, well prior to DNAreplication (J. B. Gurdon, Annu Rev Cell Dev Biol 22: 1-22 (2006); W.Mayer, Nature 403: 501-502 (2000)). Early development of primordial germcells (PGC) also involves widespread demethylation that may be mediatedby active DNA demethylation (W. Reik, Nature 447: 425-432 (2007); K.Hochedlinger, Nature 441: 1061-1067 (2006); M. A. Surani Cell 128:747-762 (2007); P. Hajkova, Nature, 452: 877-881 (2008); N. Geijsen,Nature, 427: 148-154 (2004)).The mechanism of active demethylation isnot known, and various disparate mechanisms have been postulated,including direct removal of the methyl group (i.e. direct conversion of5-methylcytosine (5mC) into cytosine, a thermodynamically unfavourableprocess that involves cleavage of a carbon-carbon bond and results inrelease of the methyl moiety), and methylcytosine-specific DNA repairthrough the activity of methylcytosine-specific or T/G mismatch-specificDNA glycosylases, and methylcytosine-specific DNA deamination or othermodification such as glycosylation or hydroxymethylation, also followedby DNA repair (reviewed in (H. Cedar, Nature, 397: 568-569 (1999), S. K.Ooi, Cell 133: 1145-1148 (2008)). However, no proteins (or set ofproteins) with these postulated activities have been reliably identifiedto date.

Identification of a Novel Family of 2OG-Fe(II) Oxygenases with PredictedDNA Modification Activities

5-methylcytosine (5mC) is a minor base in mammalian DNA: It constitutes˜1% of all DNA bases and is found almost exclusively as symmetricalmethylation of the dinucleotide CpG (M. Ehrlich and R. Y. Wang, Science212, 1350 (1981)). The majority of methylated CpG is found in repetitiveDNA elements, suggesting that cytosine methylation evolved as a defenseagainst transposons and other parasitic elements (M. G. Goll, et al.,Annu. Rev. Biochem. 74, 481 (2005)). Methylation patterns changedynamically in early embryogenesis, when CpG methylation is essentialfor X-inactivation and asymmetric expression of imprinted genes (W.Reik, Nature 447, 425 (2007)). In somatic cells, promoter methylationoften shows a correlation with gene expression: CpG methylation maydirectly interfere with the binding of certain transcriptionalregulators to their cognate DNA sequences or may enable recruitment ofmethyl-CpG binding proteins that create a repressed chromatinenvironment (A. Bird, Genes Dev. 16, 6 (2002)). DNA methylation patternsare highly dysregulated in cancer: Changes in methylation status havebeen postulated to inactivate tumor suppressors and activate oncogenes,thus contributing to tumorigenesis (E. N. Gal-Yam, et al., Annu. Rev.Med. 59, 267 (2008)).

Trypanosomes contain base J (b-D-glucosylhydroxymethyluracil), amodified thymine produced by sequential hydroxylation and glucosylationof the methyl group of thymine (P. Borst and R. Sabatini, Annu. Rev.Microbiol. 62, 235 (2008)). J biosynthesis requires JBP1 and JBP2,enzymes of the 20G- and Fe(II) dependent oxygenase superfamily predictedto catalyze the first step of J biosynthesis (Z. Yu et al., NucleicAcids Res. 35, 2107 (2007); L. J. Cliffe et al., Nucleic Acids Res. 37,1452 (2009)). Like 5-methylcytosine, base J has an association with genesilencing: It is present in silenced copies of the genes encoding thevariable surface glycoprotein (VSG) responsible for antigenic variationin the host but is absent from the single expressed copy (P. Borst andR. Sabatini, Annu. Rev. Microbiol. 62, 235 (2008)).

We used bioinformatic analysis to predict that the putative mammalianoncogenes TET1, TET2 and TET3 belong to the class of enzymes containing2OG-Fe(II) oxygenase domains. To identify homologs of the 2OG-Fe(II)oxygenase domain of JBP1 and JBP2, they were included in a profile of2OG-Fe(II) oxygenases and a systematic search of the non-redundantdatabase, as well as the protein sequence database of microbes fromenvironmental samples, with their conserved catalytic domain using thePSI-BLAST program, was conducted. A further search of the non-redundantdatabase, with proteins newly detected as a result of this search alsoincluded in the profile, and using iterative sequence profile searches,using the predicted oxygenase domains of JBP1 and JBP2, was used torecover homologous regions in three paralogous human proteins(oncogenes) TET1 (CXXC6), TET2, and TET3 (R. B. Lorsbach, Leukemia,17(3):637-41 (2003)) and their orthologs found throughout metazoa(e<10-5), as well as homologous domains in fungi and algae. In PSI-BLASTsearches of these groups of homologous domains consistently recoveredeach other prior to recovering any other member of the 2OG-Fe(II)oxygenase superfamily, indicating that they formed a distinctive familywithin it.

To confirm the relationship of the newly-identified proteins(hereinafter referred to as the JBP1/2 family) with classical 2OG-Fe(II)oxygenases, a multiple alignment of their shared conserved domains wasprepared.

Secondary structure predictions pointed to a continuous series ofβ-strands with an N-terminal a-helix, which is typical of thedouble-stranded β-helix (DSBH) fold of the 2OG-Fe(II) oxygenases (L.Aravind and E. V. Koonin, Genome Biol. 2, RESEARCH 0007 (2001)). Amultiple sequence alignment showed that the new TET/JBP family displayedall of the typical features of 2OG-Fe(II) oxygenases, includingconservation of residues predicted to be important for coordination ofthe cofactors Fe(II) and 20G. The metazoan TET proteins contain a uniqueconserved cysteine-rich region, contiguous with the N terminus of theDSBH region. Vertebrate TET1 and TET3, and their orthologs from allother animals, also possess a CXXC domain, a binuclear Zn-chelatingdomain, found in several chromatin-associated proteins, that in certaincases has been shown to discriminate between methylated and unmethylatedDNA (M. D. Allen et al., EMBO J. 25, 4503 (2006)).

Thus, we have identified the TET subfamily as having structural featurescharacteristic of enzymes that oxidize 5-methylpyrimidines. We haveshown that the domain structure of the TET subfamily proteins, includesthe CXXC domain, the “C” or Cys-rich domain, and the 2OG-Fe(II)oxygenase domain containing a large, low complexity insert.

The conserved features of the TET family of proteins include: (i) theH×D sequence (where x is any amino acid) associated with the extendedregion after the first strand which chelates Fe(II); (ii) the GGsequence at the beginning of strand 4 which helps in positioning theactive site arginine; (iii) the HXs sequence (where s is a smallresidue) in the penultimate conserved strand, in which the H chelatesthe Fe(II) and the small residue helps in binding the 2-oxo acid; (iv)the RX5a sequence (where a is an aromatic residue: F,Y,W) in the lastconserved strand of the domain. The R in this motif forms a salt bridgewith the 2 oxo acid and the aromatic residue helps in position the firstmetal-chelating histidine. The JBP1/2 family is unified by the presenceof a distinctive proline in the N-terminal conserved helix (which mightresult in a characteristic kink in the first helix of this subfamily)and a conserved aromatic residue (typically part of a sX2F sequence; ‘s’being a small residue) in the first conserved strand. These observationsindicated that TET1, TET2, and TET3, as well as the majority of JBP1/2homologs from diverse phage, fungal, algal and animal sources, arecatalytically-active 2OG-Fe(II) oxygenases. We have shown that when theconserved H×D motif is mutated to Y×A catalytic activity is eliminated.

We have shown that the vertebrate TET1 and TET3 and their orthologs (theTET subfamily) from all other animals show a fusion of the 2OG-Fe(II)oxygenase domain with a N-terminal CXXC domain, as depicted in FIG. 5.The CXXC domain is a binuclear Zn-chelating domain with 8 conservedcysteines and 1 histidine that is found in several chromatin-associatedproteins, including the animal DNA methylase DNMT1 and the methylatedDNA-binding MBD1. Different versions of this domain have been shown tobind specifically to DNA containing methylated cytosine, either on bothstrands or just a single strand. This feature, when seen in light of therelationship with JBP1/2 and the phage proteins, suggested to us thatthe TET subfamily operates on methylcytosine to catalyze oxidation oroxidative removal of the methyl group.

Additionally, the TET subfamily is characterized by a unique conserveddomain (here termed the Cys-rich or “C” domain). This domain iscontiguous with the N-terminus of the 2OG-Fe(II) oxygenase domain, andcontains at least 8 conserved cysteines and 1 histidine that are likelyto comprise a binuclear metal cluster. Based on the position of theN-terminal extensions of the AlkB protein, at least a part of the “C”domain could be similarly positioned and form an extended DNArecognition surface. The 2OG-Fe(II) oxygenase domain of the TET familycontains a large, low complexity insert predicted to have apredominantly unstructured conformation. It occurs within the DSBH foldexactly in the same position as an unstructured insert seen in theprolyl hydroxylases. Based on the structure of the prolyl hydroxylases,this insert is likely to be located on the exterior surface of theprotein, stacked against one face of the DSBH. Its persistence acrossthe entire family despite lack of sequence conservation indicates thatit might form a generalized protein-protein interaction surface.

Thus, the total weight of the contextual information available for theJBP1/2 family supports a conserved modification function for the entirefamily, namely oxidation of 5-methylpyrimidines in DNA or RNA. Withoutwishing to be limited or bound by a theory, we envision that theactivity of this family of enzymes need not be restricted tohydroxymethylation of 5-methylcytosine; certain family members could actas dioxygenases for other pyrimidines, either free, in small nucleicacids such as microRNAs, in DNA or in RNA; or could mediate furtheroxidation steps beyond hydroxymethylation, for instance to an aldehydeor an acid.

Experimental Analysis of the TET Subfamily: Cells Expressing TET1 ShowDecreased Staining for 5-Methylcytosine

To test the computational predictions for the human TET subfamily, allthree human TET proteins were subcloned into mammalian expressionvectors with tandem FLAG and HA tags. Importantly, TET1/CXXC6 is knownto be associated with the development of acute myeloid leukemia in thecontext of t(10;11)(q22;23) translocations, which result in theexpression of TET1:MLL fusion proteins that maintain the predictedcatalytic domain of TET1 while losing the SET methyltransferase domainof MLL (R. B. Lorsbach, Leukemia, 17(3):637-41 (2003); R. Ono, CancerRes 62: 4075-4080 (2002)).

To examine the effect of TET1 on overall DNA methylation levels, FLAG-and HA-tagged full-length TET1 or its C-terminal Cys-rich+DSBH domains(hereafter referred to as the C+D domain) was expressed in humanembryonic kidney (HEK) 293 cells. Two days later, we stained the cellsfor 5-methylcytosine content using a 5-methylcytosine-specific antibodyand for TET1 expression using an antibody to the HA epitope tag. Weshowed that mock-transfected cells showed substantial variation in5-methylcytosine staining intensity (FIG. 6), either because5-methylcytosine levels vary from cell to cell or because theaccessibility of 5-methylcytosine to the antibody differs among cellsbecause of technical considerations (e.g., incomplete denaturation ofDNA).

We found that cells transfected with wild-type TET1 showed a strongcorrelation of HA positivity with decreased staining for5-methylcytosine, both visually and by quantification (FIG. 6).Untransfected HA-low cells showed a spread of 5-methylcytosine stainingintensity similar to that of mock-transfected cells, whereasproductively transfected HA-high cells showed uniformly low5-methylcytosine staining intensity (FIG. 6).

We have demonstrated that overexpression of catalytically active TETsubfamily proteins leads to decreased staining with a monoclonalantibody directed against 5-methylcytosine. We have shown thatcatalytically active TET1 causes a substantial decrease in nuclearstaining for 5-methylcytosine (5mC) in transfected HEK293 cells. We havealso quantified the relation between 5-methylcytosine staining andHA/TET1 staining on a per-cell basis using the Cell Profiler program. Wefound that cells expressing full-length TET1 show a substantial decreasein 5-methylcytosine staining relative to mock-transfected cells (FIG.6). The loss of 5-methylcytosine staining is even more striking in cellsexpressing only the C+D domain of TET1, but is far less apparent incells expressing a mutant C+D domain in which two of the predictedcatalytic residues of the predicted 2OG-Fe(II) oxygenase domain, His1672and Asp1674, are mutated to tyrosine and alanine respectively (numbersrefer to residues in full-length TET1).

We used the Cell Profiler program to quantify the relation between5-methylcytosine staining and HA staining on a per-cell basis. We foundthat mock-transfected cells show a wide spread in 5-methylcytosinestaining intensity, most likely because access of theanti-5-methylcytosine antibody to the methylated cytosine requirescomplete denaturation of the DNA. In the population of cells transfectedwith full-length TET1 or the C+D domain of TET1, we found that the5-methylcytosine staining intensity of the untransfected (HA-low)subpopulation overlaps with that of the mock-transfected population, butthe productively transfected (HA-high) population shows a clear decreasein the intensity of 5-methylcytosine staining (FIG. 6). In contrast, wefound that HA-positive cells expressing the mutant H1672Y, D1674A C+Ddomain show a distribution of 5-methylcytosine staining intensity thatis much more similar to that of the mock-transduced cells.

We also found that, notably, cells expressing the C+D domain display adistinct increase in nuclear size, which again is much less apparent incells expressing the mutant protein, and we also quantified this effect.

A Novel Nucleotide in DNA from Cells Expressing TET1

The loss of 5-methylcytosine staining in TET1-expressing cells suggestedto us that the 5-methylcytosine in these cells was being modified insome way. To detect the modified nucleotide, we developed an assay basedon thin-layer chromatography (TLC) to detect the relative levels ofcytosine and 5-methylcytosine in cells. Herein, we demonstrate that TET1expression leads to the generation of a novel nucleotide. Briefly, DNAis subjected to cleavage with MspI, a methylation-insensitive enzymethat cuts at the sequence CCGG regardless of whether or not the internalCpG is methylated on cytosine. The resulting fragments, whose 5′ endsderive from the dinucleotide CpG, contain either cytosine or5-methylcytosine (H. Cedar et al., Nucleic Acids Res. 6, 2125 (1979)).The DNA is then treated with calf intestinal phosphatase (CIP),end-labeled with polynucleotide kinase (PNK), hydrolysed to dNMPs withsnake venom phosphodiesterase (SVPD) and DNase I, and the nucleotidesare separated by thin-layer chromatography.

We demonstrate that our TLC assay detected a novel nucleotide in genomicDNA of cells transfected with catalytically active full-length TET1 orits catalytic fragment (C+D)—the appearance of this novel nucleotidedepended both on 5-methylcytosine and on the expression of catalyticallyactive full-length TET1 or its catalytic fragment (C+D) in HEK293 cells.To determine if TET1 altered the relative levels of unmethylated andmethylated cytosine in cells, HEK293cells were transfected with controlvector or vector encoding full-length or C+D TET1 or their mutantversions, following which DNA was extracted from the entire transfectedpopulation and subjected to digestion, end-labeling and TLC. Compared toMspI-digested DNA from cells transfected with the control vector,MspI-digested DNA from cells expressing wildtype, but not mutant,full-length or C+D TET1 yielded a novel labeled spot migrating betweendCMP and dTMP. We showed that catalytically active (wt) but notcatalytically inactive (mut) TET1 alters the relative levels ofunmethylated and methylated cytosine in transfected HEK293 cells andresults in the appearance of the novel nucleotide, and this wasparticularly apparent with the catalytic C+D fragment. We show that theintensity of this spot correlated with a decrease in the intensity ofthe 5-methyl-dCMP (5m-dCMP) spot, suggesting strongly that the spot wasderived from 5-methyl-dCMP and not from dCMP. We also demonstrate thatneither the 5-methylcytosine spot nor the new spot were observed whenthe DNA was digested with HpaII, a methylation-sensitive isoschizomer ofMspI which cuts DNA at the sequence CCGG but only if the internal CpGdinucleotide is unmethylated, again indicating that the spot was likelyto be a derivative of 5-methyl-dCMP; this is because both5-methylcytosine and cytosine are present at the 5′ end of MspIfragments and are therefore labeled by polynucleotide kinase, but onlycytosine is represented at the 5′ end of DNA fragments produced by themethylation-sensitive isoschizomer HpaII.

To confirm that the spot was not an artefact of MspI digestion, wetested another methylation-insensitive enzyme, Taqα1, whose restrictionsite (TCGA) includes a central CG dinucleotide. As with MspI, both5-methylcytosine and cytosine are present at the 5′ end of DNA fragmentsproduced by Taqα1, and are therefore labeled. We show that Taqα1, amethylation-insensitive enzyme which cuts at the sequence TCGA, givesthe same results as MspI, a methylation-insensitive enzyme which cuts atthe sequence CCGG. Once again, the novel spot was observed inTaqα1-digested DNA from cells expressing wildtype, but not mutant,full-length or C+D TET1, and again the intensity of the spot correlatedwith a decrease in the intensity of the 5-methyl-dCMP spot.

FIG. 7 shows these experiments represented using line scans of thephosphorimaging of the labeled spots on the TLC plate. These experimentsconfirmed the correlation between loss of 5-methylcytosine andappearance of the novel nucleotide in cells expressing full-length (FL)or C+D TET1, but not FL mut or C+D mut.

Identification of the Novel Nucleotide as 5-Hydroxymethyl-dCMP

We identified the novel nucleotide produced by TET1 expression as5-hydroxymethyl-dCMP. We subcloned full-length and C+D TET1 and theirmutant versions into a vector containing an cassette in which expressionof human CD25 was driven by an internal ribosome entry site (IRES). Thisstrategy allowed identification and sorting of transfected cells thatco-expressed TET1 and CD25, and the acquisition of samples from apreparative TLC.

We showed the generation of expression plasmids based on pEF1 and usedto express full-length TET1 or its C+D catalytic domain, either wildtype(wt) or mutant (mut), together with an IRES-human CD25 cassette, and wedemonstrated that successfully-transfected cells were marked with CD25expression. The cells were sorted for CD25 expression to enrich for theTET1-expressing cell population, genomic DNA was isolated and subjectedto MspI cleavage, treatment with calf intestinal phosphatase (CIP)end-labeling with polynucleotide kinase (PNK), hydrolysis to dNMPs withsnake venom phosphodiesterase (SVPD) and DNase I, and thin-layerchromatography. The results of the TLC assay showed that the novelnucleotide (“new spot”) is only observed in DNA from cells transfectedwith the catalytically-active (C+D) fragment of TET1, and not in DNAfrom cells transfected with empty vector or the catalytically-inactivemutant version of (C+D). FIG. 8 depicts theses experiments as line scansof the labeled spots on the TLC plate, using phosphorimager analysis.

Experiments to determine the identity of the unknown nucleotide by massspectrometry were performed. Ultra performance liquid chromatography wascarried out using Acquity UPLC system (Waters Corp., Milford, Mass.).Waters HSS C18 column (1.0 mm i.d.×50 mm, 1.8-um particles) was used.The mobile phases were 0.1% aqueous ammonium formate (A, pH6.0) andMethanol (B). After initial equilibration at 100% A, the methanol wasincreased linearly from 0% to 50% over 15 minutes and then to 100%within 10 minutes and stay at 100% MeOH for 2 minutes before gettingback to 0% methanol in 10 min to flush the column. The column was thenallowed to re-equilibrate by holding 100% A for 7 min prior tosubsequent analyses. The flow rate was 0.05 ml min−1 and the eluant wasdirectly injected into the mass spectrometer. Mass spectrometry analysiswas carried out using a Q-tof Premier mass spectrometer (Waters Corp.,Milford, Mass.) fitted with an electrospray interface. Data wereacquired and processed with Masslynx 4.1 software. Instrument tuning andmass calibration were carried out using 1 mM sodium acetate solution (in1:4 H2O: ACN). Mass spectra were recorded in the negative mode withinm/z 300-500 for LC/MS runs, and within 50-350 for LC/MS/MS runs. Thequad was set to allow all ions to pass through in the LC/MS runs, andwas set to focus on the specific mass of the targeted parent ions forfragmentation in the LC/MS/MS runs. For all characterizations, Ultrapure water was obtained from a Milli-Q water purification system(Millipore). All solvents and modifiers used were mass spectrometrygrade. Methanol was purchase from Fisher Scientific. Ammonium formatewas obtained from Sigma. To determine the identity of the unknownnucleotide (336.06 Da signal in negative mode), LC/MS and LC/MS/MSexperiments were performed in which the samples eluted from TLC platewere frozen, lyophilized, and re-suspended in water for on-line LC/MSand LC/MS/MS analysis.

The region containing the unknown spot was excised from preparative TLCplates, and XCMS was used to compare the ion intensities of the signalsobtained by processing DNA from cells expressing the wild-type versusthe mutant version of TET1 C+D (FIG. 9A). After background subtraction(of the values obtained from a control run of the solvent gradient withMilli-Q water injection), a single species of 336.0582 Da was the onlyone which showed a significant difference in intensity between the twosamples. We found that the intensity of the signal from this species inthe wildtype sample was ˜19-fold greater than that in the wild-typesample, whereas for all other species the signal intensity ratio wassmaller than 2. Considering the large errors involved in the extractionof samples by scraping TLC plates, species with signal intensity ratiossmaller than 2 can reasonably be ignored. The mass of 336.06 Da isconsistent with a molecular formula of C₁₀H₁₅NO₈P⁻, or 5-hydroxymethylcytosine, an oxidation product which from our bioinformatic analysiscould reasonably be produced by TET1.

LC/MS/MS runs were carried out at several collision energies: 15, 25,35V (not shown) and 50V, in both positive and negative modes.5-hydroxymethylcytosine from T4 phage was used as standard forcomparison. For straight comparison, all the LC and MS/MS parameterswere kept exactly the same for the unknown nucleotide and the5-hydroxymethylcytsoine standard in each MS/MS run. After backgroundsubtraction (of the MS/MS of wild-type blank sample) by Matlab 7.1 (TheMathWorks, Inc.) the MS/MS spectra of the unknown nucleotide lookedexactly the same as those corresponding MS/MS spectra from the T45-hydroxymethylcytosine standard.

Since 5-hydroxymethylcytosine is not commercially available, abiological source of this nucleotide was sought. The genomes of T-evenphages contain hydroxymethylcytosine, which is normally almostcompletely glucosylated by enzymes in their E. coli hosts. Thismodification protects them from bacterial restriction enzymes such asMcrBC, which recognise and cleave DNA containing either 5-methylcytosineor 5-hydroxymethylcytosine. If these phages are grown in E. coli ER1656,a strain deficient in the glucose donor molecule UDP glucose, lackingGalU (the enzyme that catalyses formation of the glucose donorUDP-Glucose) and the McrA and McrB1 components of McrBC, they remainunglucosylated and their DNA can be used as a source of5-hydroxymethylcytosine. Indeed, through TLC analysis we showed that DNAfrom T4 phage grown in galU, mcrA, mcrB1 E. coli hosts yields only5-hydroxymethylcytosine and no cytosine or 5-methylcytosine. The5-hydroxymethylcytosine migrates similarly to the novel nucleotideobtained from TET1-expressing cells. We showed that the novel nucleotidespot is present only in cells expressing the wild-type C+D domains, andmigrates similarly by TLC analysis to authentic 5-hydroxymethylcytosineobtained from T4 phage grown in GalU-deficient E. Coli hosts. As we showin FIG. 9, the unknown nucleotide was determined to be identical toauthentic 5-hydroxymethylcytosine obtained from T4 phage grown inGalU-deficient E. Coli hosts, by using LC/MS/MS runs carried out innegative mode with collision energies of 15V and 25V.

Physiological Importance of TET1 in Gene Regulation.

We have shown that a recombinant protein comprising the catalytic domain(C+D) of human TET1, expressed in baculovirus expression vector ininsect Sf9 cells, is active in converting 5-methylcytosine to5-hydroxylmethylcytosine in vitro. Further, the catalytically activeTET1 fragments shows an absolute requirement for Fe(II) and 20G.Omission of ascorbate did not result in a significant decrease incatalytic activity, most likely because dithiothreitol was included inthe reaction to counteract the strong tendency of TET1-CD to oxidize (L.Que Jr., et al., Chem. Rev. 96, 2607 (1996); C. Loenarz, and C. J.Schofield, Nat. Chem. Biol. 4, 152 (2008); L. E. Netto and E. R.Stadtman, Arch. Biochem. Biophys. 333, 233 (1996)). We showed thatrecombinant TET1-CD was specific for 5-methylcytosine, as conversion ofthymine to hydromethyluracil (hmU) was not detected.

We used an SDS polyacrylamide gel stained with Coomassie Blue in whichlane 1 had molecular weight markers, lanes 2-4 were loaded with theindicated amounts of bovine serum albumin (BSA) (2, 1 and 0.5microgram), lanes 5-8 were loaded with eluted protein from the FLAGaffinity column used to purify C+D and C+D mutant (mut). Lanes 5 and 6had 1.6 micrograms of C+D and mut respectively, and lanes 7 and 8 had 5micrograms of C+D and mut respectively. The band around 90 kDarepresents the TET1 fragment and the bands of higher apparent molecularweight are oxidized versions of the same fragment. We used anti-FLAGwestern blots loaded with different fractions from the FLAG affinitycolumns used to purify C+D and C+D mut respectively (Lys=cell lysate;sol=soluble; ins=insoluble; FT=flowthrough; W1=wash 1; W2=wash 2; FgE1=1^(st) elution with FLAG peptide; Fg E2=2^(nd) elution with FLAGpeptide; low pH=final elution of column with low pH buffer). We showedthat the recombinant C+D fragment of TET1 is catalytically active invitro, and can produce hydroxymethyl-dCMP (Hm-dCMP) using either thefully-methylated oligo 1 or the hemimethylated oligo 3 as substrate,whereas the catalyticaly-inactive mutant C+D is not. We also showed therelative activity of the recombinant C+D fragment of TET1 in thepresence of various combinations of Fe2+, ascorbic acid, a-KG and EDTA.Briefly, 10 mg of double-stranded DNA oligonucleotides containing amethylated Taqα1 site were incubated with 3 mg of GST-SMCX in a buffercontaining 1 mM a-KG, 2 mM ascorbic acid, 75 mM Fe2+ for 3 hours at 37C. The enzyme to substrate ratio is 1:10. Oligonucleotides wereincubated under identical conditions with purified FlagHA-CD(DHD) as anegative control. Recovered oligonucleotides were digested with Taqα1,end-labeled with T4-PNK and g-32P-ATP and then hydrolyzed to dNMP's withDNaseI and snake venom phosphodiesterase. dNMP's were resolved usingcellulose TLC plates and the relative amounts of dNMP's were quantitatedusing phosphorimager. Each condition was performed in triplicate. FIG.10 shows the relative activity of the recombinant C+D fragment of TET1in the presence of various combinations of Fe2+, ascorbic acid, a-KG andEDTA.

We demonstrated the physiological importance of TET1 in gene regulation.FIG. 11A demonstrates that Tea mRNA is strongly upregulated after 8 h ofstimulation of mouse dendritic cells (DC) with LPS, a standardactivating stimulus for DC. FIGS. 11B-11I shows the changes in Tet1,Tet2 and Tet3 mRNA levels in mouse ES cells that have been induced todifferentiate by withdrawal of leukemia inhibitory factor (LIF) andaddition of retinoic acid. We cultured v6.5 mouse ES cells ongelatin-coated wells in DMEM media supplemented with 15% FBS and 10³units/ml of LIF. Twenty four hours after plating (DO time point), cellswere either continually cultured in the presence of LIF or treated with1 mM retinoic acid in the absence of LIF for up to 5 days. We showedphase contrast images of the cells, taken daily using a 20× objective.We collected cell samples daily for RNA extraction. We measuredtranscript levels of Tet1, Tet2,Tet3 and Oct4, normalized to b-actinlevels, by quantitative RT-PCR and expressed relative to levels at DO.Error bars denote mean±SD from 2 experiments. We showed that Tet1 andTet2 and the positive control pluripotency gene Oct4 are downregulated,whereas Tet3 is upregulated, during RA-induced differentiation.

We asked whether 5-hydroxymethylcytosine was a physiological constituentof mammalian DNA. Using the TLC assay, we observed a clear spotcorresponding to labeled 5-hydroxymethylcytosine in mouse embryonic stem(ES) cells. Quantification of multiple experiments indicated that5-hydroxymethylcytosine and 5-methylcytosine constituted 4 to 6% and 55to 60%, respectively, of all cytosine species in MspI cleavage sites(CACGG) in ES cells. We showed that Tea mRNA levels declined by 80% inresponse to leukemia inhibitory factor (LIF) withdrawal for 5 days,compared with the levels observed in undifferentiated ES cells; inparallel, 5-hydroxymethylcytosine levels diminished from 4.4 to 2.6% oftotal C species, a decline of ˜40% from control levels. The differencemight be due to the compensatory activity of other Tet-family proteins.Similarly, RNA interference (RNAi)-mediated depletion of endogenous Tearesulted in an 87% decrease in Tet1 mRNA levels and a parallel ˜40%decrease in 5-hydroxymethylcytosine levels. Again, the difference islikely due to the presence of Tet2 and Tet3, which are both expressed inES cells.

We show the effect of Tet RNAi on ES cell lineage gene markerexpression. Twenty four hours after plating on gelatin-coated wells (DOtime point), v6.5 ES cells were transfected with siGENOME SMARTpool(Dharmacon) siRNA targeting Tet1, Tet2 or Tet3, or a luciferase(luc)-targeting siRNA as a negative control, with Lipofectamine RNAiMAX(Invitrogen) in the presence of LIF. Cells were passaged andre-transfected pre-adherent at days 2 and 4 in the presence of LIF.Samples were collected at days 3 (D3) and 5 (D5) for RNA isolation. Wetook phase contrast images at day 5 (2 fields per transfection).Knockdown of Tet proteins causes appreciable spontaneous ES celldifferentiation (especially apparent with Tet3 knockdown, right panels).FIG. 12 shows the degree of knockdown of Tet1, Tet2 and Tet3 RNA,measured by quantitative RT-PCR and normalized to Gapdh levels, in cellstreated with Tet1, Tet2 and Tet3 siRNAs. FIG. 12 (middle and bottomrows) show expression of Tet1-Tet3, trophectoderm (Cdx2, Hand1, Psx1),primitive endoderm (Gata4), mesoderm (Brachyury) and primitive ectoderm(Fgf5) markers were measured by quantitative RT-PCR and normalized toGapdh levels. The expression of D3 control siRNA treatment was set asreference.

Without wishing to be bound by a theory, our data indicate that Tet1,and other Tet family members, are responsible for5-hydroxymethylcytosine generation in ES cells under physiologicalconditions. CpG dinucleotides are ˜0.8% of all dinucleotides in themouse genome; thus, 5-hydroxymethylcytosine (which constitutes ˜4% ofall cytosine species in CpG dinucleotides located in MspI cleavagesites) is ˜0.032% of all bases (˜1 in every 3000 nucleotides, or ˜2×10⁶bases per haploid genome). For comparison, 5-methylcytosine is 55 to 60%of all cytosines in CpG dinucleotides in MspI cleavage sites, about 14times as high as 5-hydroxymethylcytosine (5-hydroxymethylcytosine maynot be confined to CpG). An important question is whether5-hydroxymethylcytosine and TET proteins are localized to specificregions of ES cell DNA—for instance, genes that are involved inmaintaining pluripotency or that are poised to be expressed upondifferentiation. A full appreciation of the biological importance of5-hydroxymethylcytosine will require the development of tools that allow5-hydroxymethylcytosine, 5-methylcytosine, and cytosine to bedistinguished unequivocally.

As a potentially stable base, 5-hydroxymethylcytosine may influencechromatin structure and local transcriptional activity by recruitingselective 5-hydroxymethylcytosine binding proteins or excludingmethyl-CpG-binding proteins (MBPs) that normally recognize5-methylcytosine, thus displacing chromatin-modifying complexesrecruited by MBPs. Indeed, it has already been demonstrated that themethylbinding protein MeCP2 does not recognize 5-hydroxymethylcytosine(V. Valinluck et al., Nucleic Acids Res. 32, 4100 (2004)).Alternatively, without wishing to be bound by a theory, conversion of5-methylcytosine to 5-hydroxymethylcytosine may facilitate passive DNAdemethylation by excluding the maintenance DNA methyltransferase DNMT1,which recognizes 5-hydroxymethylcytosine poorly (V. Valinluck and L. C.Sowers, Cancer Res. 67, 946 (2007)). Even a minor reduction in thefidelity of maintenance methylation would be expected to result in anexponential decrease in CpG methylation over the course of many cellcycles. Finally, 5-hydroxymethylcytosine may be an intermediate in apathway of active DNA demethylation. 5-hydroxymethylcytosine has beenshown to yield cytosine through loss of formaldehyde in photooxidationexperiments (E. Privat and L. C. Sowers, Chem. Res. Toxicol. 9, 745(1996)) and at high pH (J. G. Flaks, S. S. Cohen, J. Biol. Chem. 234,1501 (1959); A. H. Alegria, Biochim. Biophys. Acta 149, 317 (1967)),leaving open the possibility that 5-hydroxymethylcytosine could convertto cytosine under certain conditions in cells. A related possibility isthat specific DNA repair mechanisms replace 5-hydroxymethylcytosine orits derivatives with cytosine (S. K. Ooi, T. H. Bestor, Cell 133, 1145(2008); J. Jiricny, M. Menigatti, Cell 135, 1167 (2008)). In support ofthis hypothesis, a glycosylase activity specific for5-hydroxymethylcytosine was reported in bovine thymus extracts (24. S.V. Cannon, et al., Biochem. Biophys. Res. Commun. 151, 1173 (1988)).Moreover, several DNA glycosylases, including TDG and MBD4, have beenimplicated in DNA demethylation, although none of them has shownconvincing activity on 5-methylcytosine in in vitro enzymatic assays (B.Zhu et al., Proc. Natl. Acad. Sci. U.S.A. 97, 5135 (2000);. R. Metivieret al., Nature 452, 45 (2008);S. Kangaspeska et al., Nature 452, 112(2008)). Cytosine deamination has also been implicated in demethylationof DNA (R. Metivier et al., Nature 452, 45 (2008); S. Kangaspeska etal., Nature 452, 112 (2008); K. Rai et al., Cell 135, 1201 (2008)); inthis context, deamination of 5-hydroxymethylcytosine yields hmU, andhigh levels of hmU:G glycosylase activity have been reported infibroblast extracts (V. Rusmintratip and L. C. Sowers, Proc. Natl. Acad.Sci. U.S.A., 97, 14183 (2000)).

Our studies alter the perception of how cytosine methylation may beregulated in mammalian cells. Notably, disruptions of the TET1 and TET2genetic loci have been reported in association with hematologicmalignancies. A fusion of TET1 with the histone methyltransferase MLLhas been identified in several cases of acute myeloid leukemia (AML)associated with t(10;11)(q22;q23) translocation (R. Ono et al., CancerRes. 62, 4075 (2002); R. B. Lorsbach et al., Leukemia 17, 637 (2003)).Homozygous null mutations and chromosomal deletions involving the TET2locus have been found in myeloproliferative disorders, suggesting atumor suppressor function for TET2 (F. Viguie et al., Leukemia 19, 1411(2005); F. Delhommeau et al., paper presented at the American Society ofHematology Annual Meeting and Exposition, San Francisco, Calif., Dec. 9,2008.). It will be important to test the involvement of TET proteins and5-hydroxymethylcytosine in oncogenic transformation and malignantprogression.

The Role of Tet Oncogene Proteins in Mouse Embryonic Stem Cells

By computational analysis, we identified the TET proteins, TET1, TET2and TET3, as mammalian homologs of the trypanosome J-binding proteinsJBP1 and JBP2 that have been proposed to oxidize the 5-methyl group ofthymine. We have found that TET1/CXXC6, previously characterized as afusion partner of the MLL gene in acute myeloid leukemia, is an iron-and a-ketoglutarate-dependent dioxygenase that catalyzes the conversionof 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (hmC), both as arecombinant protein in vitro and when overexpressed in cultured HEK293cells (Tahiliani, M., et al., Science, 2009: 324(5929): p. 930-935). Wefind that 5-hydroxymethylcytosine can be detected in the genome of mouseembryonic stem (ES) cells but not in differentiated cell types. Tea andTet2, but not Tet3, are highly expressed in mouse ES cells andRNAi-mediated depletion of both Tea and Tet2 causes loss of5-hydroxymethylcytosine. Tea and Tet2 are repressed rapidly in parallelwith Oct4 when ES cells are cultured in the absence of leukemiainhibitory factor (LIF), whereas additional treatment of retinoic acidleads to induction of Tet3 during differentiation. These changescorrespond with a decrease in genomic5-hydroxymethylcytosine levels.Loss of pluripotency caused by Oct4 RNAi also downregulates Tea and Tet2expression with loss of 5-hydroxymethylcytosine. On the other hand, gainof pluripotency in induced pluripotent stem (iPS) cell reprogrammed frommouse fibroblasts is associated with induction of both Tea and Tet2 andappearance of 5-hydroxymethylcytosine. RNAi-depletion of each Tet memberdoes not decrease mRNA levels of the pluripotency-associated genes Oct4,Sox2 and Nanog, but Tet1 RNAi results in induction of genes that specifytrophectodermal lineage. Our results suggest (i) that Tea and Tet2catalyze conversion of 5-methylcytosine to 5-hydroxymethylcytosine inmouse ES cells; (ii) that Tet1, Tet2 and 5-hydroxymethylcytosine areassociated with the pluripotent state; (iii) that Tet1 and Tet2 aredownstream targets of the transcriptional network regulated by Oct4 and(iv) that Tea is a novel factor involved in repression of trophectodermlineage development during the first cell-fate decision in mouseembryogenesis.

We used the following methods in our analyses. To performimmunofluorescence, we transfected cells with pEF1a expressionconstructs with HA-epitope N-terminal of full length (FL) TET1 orcatalytic domain alone (TET1 CD) or empty vector (mock) for 2 days, asdepicted in FIG. 13A. Fixed cells were treated with 2N HCl to denatureDNA before co-staining with rabbit anti-HA (Santa Cruz Biotechnology)and mouse anti-5-methylcytosine (Calbiochem) antibodies which weredetected using secondary antibodies coupled with Cy2 or Cy3respectively. Nuclei were stained with DAPI before mounting forfluorescence imaging.

To perform thin-layer chromatography (TLC), genomic DNA was digestedwith the restriction endonuclease MspI, which cleaves at CACGG sites, togenerate fragments whose 5′ ends derive from the dinucleotide CpG andcontain either 5-methylcytosine, C or 5-hydroxymethylcytosine. Thedigested DNA was then radiolabeled at the 5′ ends and then hydrolysedfrom the 3′ ends to single dNMPs which were resolved by TLC. Spotintensities were measured by phosphoimaging densitometry and5-hydroxymethylcytosine levels are represented as percentages of totalcytosine (5mC+C+hmC). Values were mean±SD from triplicate samples (FIG.13A).

To perform cell culture and RNA interference (RNAi), V6.5 mouse ES cellswere maintained on feeder layers in standard ES medium but were replatedon gelatin-coated wells for the experiments described. RNAi experimentswere performed using Dharmacon siGENOME siRNA duplexes. Mouse ES cellswere transfected with 50 nM siRNA using Lipofectamine RNAiMAX reagent(Invitrogen) in the presence of LIF. Retransfections were performed onpre-adherant cells every 2 days and cells were harvested at Day 5 forRNA and TLC analyses.

We performed RNA extraction, cDNA synthesis and quantitative real-timePCR analyses. Briefly, total RNA was isolated with an RNeasy kit(Qiagen) with on-column DNase treatment. cDNA was synthesized from 0.5mg total RNA using SuperScriptIII reverse transcriptase (Invitrogen).Quantitative PCR was performed using FastStart Universal SYBR Greenmaster mix (Roche) on a StepOnePlus real-time PCR system (AppliedBiosystems). Gene expression was normalized to Gapdh and referenced toDay 0 samples. Data shown are mean±SEM, n=3-4.

We indentified 5-hydromethylcytosine as the catalytic product ofconversion from 5-methylcytosine by TET1 and detected5-hydromethylcytosine in the genome of mouse ES cells (FIG. 13C). Weshowed that overexpression of HA-TET1 in HEK293 cells causes loss ofstaining with an antibody to 5-methylcytosine. We found that TLC ofcells overexpressing full-length (FL) TET1 or the predicted catalyticdomain (CD) reveals the appearance of an additional nucleotide speciesidentified by mass spectrometry as 5-hydromethylcytosine. We found thatH1671Y, D1673A mutations at the residues predicted to bind Fe(II)abrogate the ability of TET1 to generate 5-hydromethylcytosine, and that5-hydromethylcytosine is detected in the genome of mouse ES cells (FIG.13B).

We found a role for murine Tet1 and Tet2 in the catalytic generation of5-hydromethylcytosine in ES cells. The mouse genome expresses threefamily members—Tet1, Tet2 and Tet3—that share significant sequencehomology with the human homologs (FIG. 14A) (Lorsbach, R. B., et al.,Leukemia, 2003. 17(3): p. 637-41). Tet1 and Tet3 encode within theirfirst conserved coding exon the CXXC domain. We show that mouse ES cellsexpress high levels of Tet1 and Tet2 (FIG. 15), but not Tet3, which canbe depleted with RNAi (FIG. 14). We found that RNAi-depletion of Tet1 orTet2 alone decreases 5-hydromethylcytosine levels partially but combinedRNAi reduces 5-hydromethylcytosine levels further, suggesting that bothTet1 and Tet2 are enzymes responsible for the catalytic conversion of5-methylcytosine to 5-hydroxymethylcytosine in mouse ES cells.

We showed changes in Tet family gene expression occur in mouse ES cellsupon differentiation. We found that mRNA levels of Tet1, Tet2 and Oct4rapidly decline upon LIF withdrawal (FIG. 15). Tet3 level remains lowupon LIF withdrawal but increases 10-fold with addition of retinoic acid(FIG. 15C). We found that the decline of Tea and Tet2 expression isassociated with loss of 5-hydromethylcytosine.

We found that Tet1, Tet2 and 5-hydromethylcytosine are associated withpluripotency. We show that the loss of pluripotency induced byRNAi-mediated depletion of Oct4 potently suppresses Tea and Tet2expression and upregulates Tet3 (FIGS. 16A-16C). We show that Sox2 RNAicauses a similar, though weaker, effect as Oct4 RNAi and that Nanog RNAihas almost no effect (FIGS. 16A-16C). We found that RNAi-depletion ofOct4 in particular causes loss of 5-hydromethylcytosine in ES cells. Weshow that the gain of pluripotency in iPS clones derived from mousetail-tip fibroblasts (TTF) by viral transduction of Oct4, Sox2, Klf4 andc-Myc is associated with up-regulation of Tea and Tet2 and appearance of5-hydromethylcytosine in the genome (FIGS. 16D-16E).

We show that Tet family member knockdown impacts ES cell pluripotencyand differentiation genes. We show that RNAi-mediated knockdown of eachTet family member does not affect expression of the pluripotency factorsOct4, Sox2 and Nanog (FIGS. 17A-17C). We show that RNAi-depletion ofTet1, but not of Tet2 or Tet3, increases the expression of thetrophectodermal genes Cdx2, Eomes and Hand1 (FIGS. 17D-17F). We showthat RNAi-depletion of Tet family members produces small insignificantchanges in expression of extraembryonic endoderm, mesoderm and primitiveectoderm markers (FIGS. 17G-17I).

The Effect of 5-Hydroxymethylcytosine on Sodium Bisulfite-Based Analysisof DNA Methylation Status

Treatment of DNA with sodium bisulfite promotes the deamination ofcytosine to uracil, while 5-methylcytosine is deaminated at a far slowerrate, allowing the methylation state of a given cytosine to beascertained. The reaction of sodium bisulfite with cytosine, 5methylcytosine and 5-hydroxymethylcytosine differs, as depicted in FIG.23. During bisulfite-mediated deamination of cytosine, HSO₃ ⁻ reversiblyand quickly adds across the 5,6 double bond of cytosine, promotingdeamination at position 4 and conversion to U—SO₃ ⁻. U—SO₃ ⁻ is stableunder neutral conditions, but is easily desulfonated to uracil at higherpH. 5-methylcytosine is deaminated to thymine by bisulfite conversion,but the rate is approximately two orders of magnitude slower than thatof cytosine. Recently, we showed that 5-hydroxymethylcytosine is presentin mammalian DNA (S. Kriaucionis and N. Heintz, Science 324, 929 (2009);M. Tahiliani et al., Science 324, 930 (2009)). Bisulfite reacts with5-hydroxymethylcytosine to form cytosine 5-methylenesulfonate. Thisadduct does not readily undergo deamination (H. Hayatsu, et al.,Biochemistry 9, 2858 (1970); R. Y. Wang, et al., Nucleic Acids Res 8,4777 (1980); H. Hayatsu and M. Shiragami, Biochemistry 18, 632 (1979)).

Bisulfite sequencing usually entails PCR amplification of a region ofbisulfite-treated genomic DNA containing the cytosines of interest,followed by sequencing of PCR clones. Cytosine to thymine transitionswill be observed at all unmethylated cytosines (M. Frommer et al., ProcNatl Acad Sci USA 89, 1827 (1992)). To test whether the bulky cytosine5-methylenesulfonate adduct impedes PCR amplification of the treatedDNA, we generated DNA templates containing cytosine, 5-methylcytosine or5-hydroxymethylcytosine as their sole cytosine species, as shown in FIG.24. To do this, we PCR-amplified a 201 bp oligonucleotide using thenucleoside triphosphates dATP, dGTP, dTTP with dCTP or its5-methylcytosine or 5-hydroxymethylcytosine derivatives. The PCRproducts were treated with bisulfite, exposed to conditions promotingdeamination and desulfonation, and amplified with the primers: SEQ IDNO: 7: ATTGTCGTAGGTTAAGTGGATTGTAAGGAGGTAG and SEQ ID NO: 8:ATTCACTACCACTCTCCTTACTTCTCTTTCTCC (reverse primer used for primerextension).

Under these conditions, 5-hydroxymethylcytosine-containing DNA was verypoorly amplified compared to cytosine- and 5-methylcytosine-containingDNA. Sequencing of the amplified DNA confirmed that bisulfite-treated5-hydroxymethylcytosine did not undergo cytosine->thymine transitions,demonstrating, as expected, that 5-hydroxymethylcytosine and5-methylcytosine cannot be distinguished by the bisulfite technique.Since 5-hydroxymethylcytosine is present in embryonic stem (ES) cells ata level ˜10% of 5-methylcytosine (M. Tahiliani et al., Science 324, 930(2009)), it is likely that a proportion of the regions identified asmethylated in the ES cell genome (C. R. Farthing et al., PLoS Genet 4,e1000116 (2008); B. H. Ramsahoye et al., Proc Natl Acad Sci USA 97, 5237(2000)) are actually hydroxymethylated.

To determine if a block in PCR amplification occurred, we performedprimer extension assays using two commercial sources of Taq polymerase.A ladder of incomplete extension products was seen only withbisulfite-treated, 5-hydroxymethylcytosine-containing DNA, in which the5-hydroxymethylcytosine had been converted to the bulky cytosine5-methylenesulfonate. The most significant stalling occurred atpositions across from a CTC sequence close to the end of the reverseprimer, and a CCGC sequence and several CC sequences further away. Wealso found that there were cytosine residues where stalling was weak ordid not occur. Thus, cytosine 5-methylenesulfonate stalls but does notblock Taq polymerase, and the stalling is particularly striking when twocytosine 5-methylenesulfonate residues are adjacent (FIG. 25).

In mammalian DNA, 5-methylcytosine (and therefore its hydroxylatedderivative, 5-hydroxymethylcytosine) are found almost exclusively in thecontext of the dinucleotide CpG (B. H. Ramsahoye et al., Proc Natl AcadSci USA 97, 5237 (2000); Y. Gruenbaum, et al., FEBS Lett 124, 67 (1981);M. Ehrlich, R. Y. Wang, Science 212, 1350 (1981)). To evaluate thedegree to which CMS would stall Taq polymerase in this physiologicalcontext, we synthesized a set of 158 bp oligonucleotides in which thetop strand contained one common CG dinucleotide (in the sequence TCGA,highlighted in FIG. 24B) and a second variable sequence that was one ofthe following: GGAT, CGAT, CCAT, CGCG, or CCGG (indicated by XXXX inFIG. 24B). After bisulfite treatment, the most significant stalling wasobserved at the tandem CC sequences in the CC and CCGG oligonucleotides.A minor amount of stalling was observed at the same position in the 2-CG(two non-continuous CGs) and CGCG oligonucleotides. Nevertheless, the1-CG, 2-CG and CGCG oligonucleotides were efficiently amplified afterbisulphite treatment, whereas oligonucleotides containing CC sequencesshowed a perceptible decrease in amplification efficiency (FIG. 25). Theprimers used for amplification were: SEQ ID NO: 9:GTGAAATATTGTGGTAGGTTAAGTGGATTGTAAGGAG and SEQ ID NO: 10:CATCTTAATTAACACTACCACTCTCCTTACTTCTCTTTCT.

We postulated that if cytosine 5-methylenesulfonate can stall DNApolymerase, genomic loci containing hydroxymethylated DNA might beunderrepresented in quantitative methylation analyses. To evaluate thispoint, we examined the MLH1 locus, which is known to be heavilymethylated in HEK293T cells (S. Fukushige, et al., Biochem Biophys ResCommun 377, 600 (2008)). We confirmed this point by bisulfite sequencingof genomic DNA purified from HEK293T cells (FIG. 26). The primers usedto sequence were: SEQ ID NO: 11: GTGAATTAAGGATTTTTTTGTGTG and SEQ ID NO:12: AAAAAACATTTCCCTACTTC. Two different amplicons in the MLH1 locus wereshown to contain more than 10 highly methylated CpGs; methylatedcytosines, which do not undergo C->T transitions, are shown in bold,whereas partially methylated C's which yielded a mixture of C and Tafter bisulfite sequencing, are highlighted and indicated by Y (FIG.26). The primers we used to amplify the MLH1 locus amplicons were: SEQID NO: 13: GTTAGATTATTTTAGTAGAGGTATATAAGT and SEQ ID NO: 14:ACCAATCAAATTTCTCAACTCTAT; and SEQ ID NO: 15: TGAGAAATTTGATTGGTATTTAAGTTGand SEQ ID NO: 16: CAATCATCTCTTTAATAACATTAACTAACC. We then treated thegenomic DNA with the recombinant catalytic domain of TET1 in vitro.Roughly 80% of 5-methylcytosine in MspI or Taqα1 sites was converted to5-hydroxymethylcytosine (FIG. 27). Real-time PCR analysis showed thatuntreated and TET1-treated (hydroxymethylated) DNAs were amplified withalmost identical efficiency (FIG. 26), even though each ampliconcontained more than 10 highly methylated CpGs.

In summary, we have shown that the bisulfite technique for DNAmethylation analysis does not distinguish between5-hydroxymethylcytosine and 5-methylcytosine; that loci containing denseregions of hydroxymethylated DNA may be underrepresented in quantitativemethylation analyses; and that primer extension reactions conducted withbisulfite-treated DNA would be predicted to terminate disproportionatelyat sites of hydroxymethylation. It should be possible to take advantageof our findings, combining ligation-mediated PCR with primer extensionunder suboptimal extension conditions to determine the location of5-hydroxymethylcytosine in the genome. It is unclear how CMS inhibitsPCR. Rein et al. proposed that CMS would block DNA polymerase by analogyto oxidative pyrimidine adducts such as thymine glycol (T. Rein, et al.,Nucleic Acids Res 26, 2255 (1998)). However, CMS retains aromaticity,whereas it has since been demonstrated that polymerases are disrupted bythymine glycol's loss of aromaticity and consequent adoption of a chairgeometry (P. Aller, et al., Proc Natl Acad Sci USA 104, 814 (2007)).Whatever the mechanism, the observation that 5-hydroxymethylcytosine canstall Taq polymerase after bisulfite reactions may have importantramifications for our interpretation of previous DNA methylationanalyses as discussed above.

Materials and Methods

Minigenes were designed for generation of DNA templates containingcytosine, 5-methylcytosine or 5-hydroxymethylcytosine. Minigenes used astemplates to amplify cytosine, 5-methylcytosine or5-hydroxymethylcytosine containing oligonucleotides were synthesized byIntegrated DNA Technologies. DNA containing cytosine 5-methylcytosine or5-hydroxymethylcytosine was amplified by PCR using nucleosidetriphosphates dATP, dGTP, dTTP with dCTP or its derivatives mdCTP (GEhealthcare) or hmdCTP (Bioline). PCR products were run on a 2% agarosegel to confirm correct length and further purified by a gel extractionkit (Qiagen).

Bisulfite treatment and recovery of samples were carried out with theEpiTect Bisulfite kit (QIAGEN) by following manufacturer's instructions.In brief, 2 μg DNA in 20 μL volume was used for each reaction and mixedwith 85 μL bisulfite mix and 35 μL DNA protect buffer. Bisulfiteconversion was performed on a thermocycler as follows: 99° C. for 5 min,60° C. for 25 min, 99° C. for 5 min, 60° C. for 85 min, 99° C. for 5min, 60° C. for 175 min and 20° C. indefinitely. The bisulfite treatedDNA was recovered by EpiTect spin column and subsequently sequenced toconfirm the efficiency of bisulfite conversion.

RealTime PCR of oligonucleotides was performed on the StepONE plusreal-time PCR system (Applied Biosystems) by using the FastStartUniversal SYBR Green Master kit (Roche). 0.1 μg DNA template and 0.15 mMprimers were used in each reaction. The amplification reaction programwas set as: 95° C. for 10 min, 40 cycles of 95° C. for 15 sec, 60° C.for 1 min, and a melt curve analysis step at the end. Data were analyzedby StepONE plus real-time PCR software.

To perform the primer extension assays, reverse primers (50 ng) were endlabeled with T4 polynucleotide kinase (T4 PNK) (NEB) and 10 μCi of[γ-32P]-ATP (PerkinElmer) for 1 hr at 37° C., and then purified byIllustra MicroSpin G-25 column (GE Healthcare). For the primerextension, 2 ng template, 4 pmol γ32-P-labeled primers were used. PCRreactions were set up according to manufacturer's instructions using twocommercial sources of Taq DNA polymerase (Roche and Sigma). For RocheTaq DNA polymerase, the PCR condition was set as: 95° C. for 10 min, 30cycles of 95° C. for 15 sec, 60° C. for 1 min. For Sigma TagREDpolymerase, the PCR condition was set as: 30 cycles of 94° C. for 1 min,55° C. for 2 min and 72° C. for 1 min. The primer extension productswere mixed with 2× gel loading buffer II (Ambion), denatured at 95° C.for 15 min and loaded to 12% polyacrylamide gel denaturing (7 M urea).Sanger sequencing were performed using Thermo Sequenase Dye PrimerManual Cycle Sequencing kit (USB). 2 ng template and 1 pmol[γ32-P]-labeled primer were used for Sanger sequencing. The results werevisualized by autoradiography.

Real Time PCR of bisulfite treated genomic DNA was performed byextracting genomic DNA from HEK293 cells (as described in (H. Hayatsu,et al., Biochemistry 9, 2858 (1970)), and shearing the DNA by vortexingto facilitate pipeting. Recombinant human TET1 catalytic domain (CD) wasexpressed in insect cells as in (H. Hayatsu, et al., Biochemistry 9,2858 (1970)). 12 μg of DNA was then reacted with 18 μg of TET1-CD in 50mM HEPES pH 8.0, 50 mM NaCl, 2 mM Ascorbic Acid, 1 mMalpha-ketoglutarate, 100 μM FAS, and 1 mM DTT. The total reaction volumewas 300 μL and the reaction ran 90 minutes at 37° C. The WT sample wassubjected to the same reaction conditions without enzyme.

The DNA was then ethanol precipitated by the addition of 0.1 volume of 3M sodium acetate pH 7.4, linear polyacrylimide, and 3 volumes ofethanol, followed by freezing and spinning at 16000 g for 30 minutes at4° C. The sample was then washed twice with 70% ethanol, air dried, andresuspended in 10 mM Tris 0.1 mM EDTA. Resuspension proceeded overnightwith gentle shaking at 45° C. About 500 ng of the DNA was digested withMspI or Taqα I, end labeled, digested to single nucleotides, and run onTLC as described. The data was analyzed on a phosphorimager. The strongcytosine peak seen in this work comes from the fact that we sheared theDNA beforehand, resulting in breaks not created by the enzyme which wereend-labeled. This did not confound interpretation of methylation loss orthe extent of hydroxymethylation.

The DNA was bisulfite treated as described above, and was quantifiedafterward using a Nanodrop (NanoDrop DN-1000 spectrophotometer, ThermoScientific). Bisulfite treated DNA can no longer reanneal, so anabsorbance constant typical of single stranded DNA (33 μg DNA/(mL*0D260units) was used. Bisulfite treatment changes the absorption propertiesof DNA so the estimated quantities could be off, but any error would beapproximately consistent between the TET-CD treated and WT samples.

The primers used in the PCR of the CGless region in FIG. 26 and FIG. 27were designed with the Bisearch Primer Design tool (R. Y. Wang, et al.,Nucleic Acids Res 8, 4777 (1980)). A long stretch of DNA, arbitrarilychosen, lacking CpGs was used as input for the program, though a CpG hadto be typed into the middle of the sequence to allow the input sequenceto be processed. The primers used for the MLH promoter were taken from(Fukushige), with a couple bases added to raise their meltingtemperature.

The Real Time PCR was performed using the FastStart Universal SYBR GreenMaster kit (Roche), with each primer present at a final concentration of0.15 mM. PCR was run on a StepOnePlus Real Time PCR System (AppliedBiosystems), programmed to undergo an initial 10 minute 95° C. step;fifty cycles of 95° C. for 15 s, 50° C. for 30 s, 60° C. for 90 s; and amelt curve analysis step at the end. PCR products were run on an agarosegel to confirm that the correct sized product was formed as the dominantband.

Real Time PCR product was handled using different pipets than were usedto set up PCRs, and also handled on different surfaces, to preventcross-contamination.

The Effect of 5-Hydroxymethylcytosine on Sodium Bisulfite-Based Analysisof DNA Methylation Status

DNA methylation at the carbon-5 position of cytosine (5-methylcytosine,also regarded as the “fifth” base) is a stable epigenetic mark found ineukaryotes that imparts an additional layer of heritable informationupon DNA. In normal cells, DNA methylation plays vital roles inembryogenesis and development, regulation of gene expression, silencingof transposable elements, and genomic imprinting. In cancer cells, DNAhypermethylation in CpG-island-promoters has been linked to aberrantsilencing of tumor suppressor genes. Epigenomic profiling of DNAmethylation could serve as marker of cancer cells and indicator fortumor prognosis, as well as useful predictor of response tochemotherapy.

We have shown that 5-hydroxymethylcytosine is present in mammalian DNA,and that a novel family of proteins, the TET proteins, is capable ofconverting 5-methylcytosine to 5-hydroxymethylcytosine both in vitro andin vivo.

Bisulfite sequencing has been one of the most widely-used techniques forglobal profiling of cytosine methylation patterns. Bisulfite sequencingrelies on the fact that reaction with bisulfite promotes the deaminationof unmethylated cytosine to yield uracil (read as thymine after PCR).Deamination occurs orders of magnitude more slowly with 5-methylcytosineand 5-hydroxymethylcytosine; 5-methylcytosine reacts poorly withbisulfite whereas 5-hydroxymethylcytosine forms a distinct adduct,cytosine 5-methylsulfonate. Thus, while unmethylated cytosine will beread as thymine, both 5-methylcytosine and 5-hydroxymethylcytosine willstill be read as cytosine in subsequent PCR reactions. As a result, allcytosine methylation analyses to date run the risk of conflating5-methylcytosine and 5-hydroxymethylcytosine. It is highly likely thatgenomic loci identified as methylated with traditional methods areactually hydroxymethylated.

To test whether this particular modification on 5-methylcytosine wouldaffect bisulfite sequencing or not, we designed a set of experiments byusing synthesized 5-hydroxymethylcytosine oligonucleotides and genomicDNA treated with TET protein.

The experimental design for primer extension assays that we used isoutlined below. We showed primer extension assays for DNA containingdifferent cytosine species, and compared it besides a Sanger sequencingladder. We found that ladders of incomplete extension products were onlyobserved in an 5-hydroxymethylcytosine-containing DNA after bisulfitetreatment, at positions corresponding to G in Sanger sequencing ladder.We found that less full length product was observed in the extensionreaction with 5-hydroxymethylcytosine-containing DNA treated withbisulfite.

We performed primer extension assays of DNA containing CpG combinations:1CpG, 2CpG, CGCG, CC and CCGG. We showed that the bands corresponding tostalled PCR reaction were notably observed in the5-hydroxymethylcytosine-containing CC or CCGG oligonucleotides afterbisulfite treatment. The stalling effect, though less obvious, was alsoobserved in bisulfite-treated, 5-hydroxymethylcytosine-containingoligonucleotides with CG or CGCG.

We performed Tet treatment of MLH1 promoter amplicons, both of whichcontained more than ten fully methylated residues as determined bysequencing of bulk PCR product delayed amplification by less than onecycle. Amplification of a region lacking CpGs, and thus5-hydroxymethylcytosine, was similar in the WT and TET1 treatedpopulations.

We designed a strategy of incorporating 5-methylcytosine and5-hydroxymethylcytosine into designed oligonucleotides. We confirmedthat the 5-hydroxymethylcytosine was successfully incorporated into theoligonucleotide using TLC. Analyzing sequencing traces of5-hydroxymethylcytosine-containing oligonucleotides before and afterbisulfite treatment indicated that bisulfite treated5-hydroxymethylcytosine did not undergo cytosine to thymine transitions.The control cytosine-containing oligonucleotides completely underwentcytosine to thymine conversion. We performed real-time PCR amplificationcurve of an oligonucleotide containing cytosine, 5-methylcytosine or5-hydroxymethylcytosine before and after bisulfite treatment. The smalllag observed for the bisulfite-treated cytosine oligonucleotide is due,in part, to the fact that after conversion of cytosine to uracil, thisoligonucleotide can only be amplified from one of the two strands. Wequantified the ACt value from experiments performed.

In summary, we have shown that the bisulfite technique for DNAmethylation analysis does not distinguish between 5-methylcytosine and5-hydroxymethylcytosine; that loci containing dense regions ofhydroxymethylated DNA may be under-represented in quantitativemethylation analyses; and that primer extension reactions conducted withbisulfite-treated DNA would be predicted to terminate disproportionatelyat sites of hydroxymethylation.

It should be possible to take advantage of our findings, in someembodiments, by combining ligation-mediated PCR with primer extensionunder suboptimal extension conditions to determine the location of5-hydroxymethylcytosine in the genome. It is unclear howcytosine-5-methylsulfonate inhibits PCR. Rein et al. proposed thatcytosine-5-methylsulfonate would block DNA polymerase by analogy tooxidative pyrimidine adducts such as thymine glycol. However,cytosine-5-methylsulfonate retains aromaticity, whereas it has sincebeen demonstrated that polymerases are disrupted by thymine glycol'sloss of aromaticity and consequent adoption of a chair geometry.Whatever the mechanism, the observation that 5-hydroxymethylcytosine canstall Taq polymerase after bisulfate reactions may have importantramifications for our interpretation of previous DNA methylationanalyses as discussed herein.

The Effect of 5-Hydroxymethylcytosine on Sodium Bisulfite-Based Analysisof DNA Methylation Status

Cytosine methylation, typically found in the context of CpG sequences,is critical in vertebrates and performs functions such as regulation oftranscription and silencing of transposable elements (W. Reik, Nature447, 425 (May 24, 2007)). Recently, we predicted that the TET family ofproteins would oxidize 5-methylcytosine to 5-hydroxymethylcytosine (L.M. Iyer, et al., Cell Cycle 8, 1698 (2009)). Acting on this prediction,we found that expression of the catalytic domain (CD) of human TET1 in293T cells caused formation of 5-hydroxymethylcytosine and acorresponding loss of 5-methylcytosine. Recombinant human TET1 CDefficiently oxidized 5-methylcytosine to 5-hydroxymethylcytosine invitro. We also found that 5-hydroxymethylcytosine is present inmammalian DNA and is particularly abundant in Embryonic Stem Cells. Inmurine ES cells, siRNA knockdown of Tea and Tet2 causes a reduction inobserved hydroxymethylcytosine levels (M. Tahiliani et al., Science 324,930 (2009)). Independently, another group reported the presence of5-hydroxymethylcytosine in Purkinje neurons (S. Kriaucionis, N. Heintz,Science 324, 929 (2009)).

TET proteins include three recognizable domains. A CXXC domain, which inother proteins is involved in binding of unmethylated CpG motifs, adouble-stranded beta-helix (DSBH) which contains the catalytic residues,and a cysteine rich region. The function of this last domain is unclear,but based on its similarity to zinc finger domains and its positionrelative to the DSBH, it may be involved in DNA binding.

Very little is known about the physiological role of TET proteins or5-hydroxymethylcytosine. The DSBH of TET1 is found in a fusion with theoncogene MLL in rare leukemias (R. B. Lorsbach et al., Leukemia 17, 637(2003); R. Ono et al., Cancer Res 62, 4075 (2002)). Null mutations ofTET2 are found in a significant fraction of patients with AML orprecancerous myelodysplastic disorders, and TET2 is thus believed to bea tumor suppressor that is lost early in the development of myeloidtumors (S. M. Langemeijer et al., Nat Genet 41, 838 (2009); F.Delhommeau et al., N Engl J Med 360, 2289 (2009)). The mechanism ofTET's role in cancer is undetermined. Tet2 deficient mice die shortlyafter birth, again for unknown reasons (H. Tang, et al., Transgenic Res17, 599 (2008)).

While 5-hydroxymethylcytosine has no known function, without wishing tobe limited by a theory, it is thought that it might facilitatedemethylation either by “flagging” methylated cytosines for removal orblocking maintenance methylation. Without wishing to be limited by atheory, it may also have a role in blocking 5-methylcytosine bindingproteins or recruiting as yet undiscovered 5-hydroxymethylcytosinebinding proteins.

In one embodiment, we can determine whether hydroxymethylation leads toactive and/or passive demethylation of 5-methylcytosine in DNA. Asdiscussed, 5-hydroxymethylcytosine may lead, without wishing to be boundby a theory, to demethylation by an active or passive mechanism. Anactive mechanism might entail removal of 5-hydroxymethylcytosine by DNArepair machinery, which, without wishing to be limited or bound to atheory, is most likely base excision repair, which is typically used toremove lesions that do not disrupt the broad structure of DNA (V.Valinluck, et al., Nucleic Acids Res 33, 3057 (2005)). Most DNAglycosylases generate abasic sites or 3′ phospho a, 13-unsaturatedaldehydes, both of which react with an aldehyde specific molecule calledARP (FIG. 28). Removal of these repair intermediates is therate-limiting step in DNA repair, and thus large scale glycosylaseactivity would be predicted, without wishing to be constrained by atheory, to generate many aldehydes in DNA which could be measured viaARP. We found that in 293T cells, expression of the TET1 catalyticdomain (CD) did not cause a significant increase in aldehyde density(FIG. 29). We considered MBD4 to be a likely glycosylase to remove5-hydroxymethylcytosine, as it is known to repair the somewhat analogouscompound 5-bromocytosine (V. Valinluck, et al., Nucleic Acids Res 33,3057 (2005)) and it binds to methylated DNA (B. L. Parsons, Proc NatlAcad Sci USA 100, 14601 (2003)). Also, an MBD4 homologue is fused to adistant TET homologue in some algae species (L. M. Iyer, et al., CellCycle 8, 1698 (2009)). However, coexpressing MBD4 with TET1 CD did notsignificantly increase abasic sites (FIG. 29), reduce5-hydroxymethylcytosine levels, or increase cytosine levels.

Meanwhile, it has become clear that in 293T cells TET's main effect isto convert cytosine to 5-hydroxymethylcytosine. Only a modest rise incytosine is observed upon TET expression, which could arise via blockingof maintenance methylation as opposed to repair (M. Tahiliani et al.,Science 324, 930 (2009)). Also, the simple fact that cells can toleratesuch high levels of 5-hydroxymethylcytosine would seem to indicate,without wishing to be bound by a theory, that at least in 293T cells,large-scale glycosylase activity is not occurring. We have cloned anumber of DNA repair proteins (MBD4, SMUG1, TDG, NTHL1, NEIL1, NEIL2 andAPEX1), and can test their involvement in resolution ofhydroxymethylcytosine. We can do this by expressing the enzymes inmammalian cells, then determining whether any5-hydroxymethylcytosine-glycosylase activity is present in lysate bymonitoring cleavage of a hydroxymethylcytosine-containing oligo. Forexample, in one aspect we can express a test glycosylase of interest in293T cells. We can generate and end-label oligonucleotides, where atleast one oligonucleotide has 5-hydroxymethylcytosine residues andanother oligonucleotide has a known substrate for the test glycosylase.The glycosylase expressing 293 cells are then lysed and theoligonucleotides are added to the lysate. The oligonucleotides are thenexposed to alkaline conditions in order to generate abasic sites on theoligonucleotides. The oligonucleotides are then run on a denaturing gelto detect breaks as described herein. If both the hydroxymethylated andpositive control oligonucleotides are cut, it indicates that the testglycosylase recognizes 5-hydroxymethylcytosine. If only positive controloligonucleotide is cut, it indicates that the test glycosylase does notrecognize 5-hydroxymethylcytosine. If we observe no cutting of both thehydroxymethylated and positive control oligonucleotides, it indicatesthat the test glycosylase is not active in conditions used in assay.

In another aspect, we can also determine whether hydroxymethylationblocks maintenance methylation. Without wishing to be bound by a theory,DNMT1 might not efficiently methylate cytosines at CpGs oppositehydroxymethylated CpGs, an observation with some in vitro backing (V.Valinluck, and L. C. Sowers, Cancer Res 67, 946 (2007)). Also, it hasbeen observed that methylation activates DNMT1 allosterically (R. Goyal,et al., Nucleic Acids Res 34, 1182 (2006); Z. M. Svedruzic, Curr MedChem 15, 92 (2008)), and hydroxymethylation may not have this effect.Finally, DNMT1 requires the partner protein UHRF1, which selectivelybinds hemimethylated CpGs, for localization to newly replicated DNA (M.Bostick et al., Science 317, 1760 (2007); J. Sharif et al., Nature 450,908 (2007)). Inhibition of UHRF1 binding could also block maintenancemethylation.

We have expressed recombinant UHRF1 and showed that it has modestlyimpaired binding to hemihydroxymethylated, as opposed to hemimethylated,DNA, as determined by an Electromobility Shift Assay (EMSA). We saw somebinding to unmethylated DNA, which was not observed in past work (M.Bostick et al., Science 317, 1760 (2007); C. Qian et al., J Biol Chem283, 34490 (2008)) possibly because of the use of different blockingagents. We can also better replicate the conditions used in past workand determine the preference for hemimethylated overhemihydroxymethylated DNA under these conditions. We can also determinewhether maintenance methylation of hydroxymethylated DNA is impaired.Episomal plasmids have been shown to maintain methylation faithfullythrough many cell divisions and are relatively easy to manipulate (C. L.Hsieh, Mol Cell Biol 14, 5487 (1994)), and we can compare themaintenance of methylated versus hydroxymethylated episomes.

We can also evaluate and discover methods for determining wherehydroxymethylcytosine residues are located in DNA.

The discovery of 5-hydroxymethylcytosine in mammalian DNA forces areassessment of old techniques used to differentiate methylated andunmethylated cytosine. Furthermore, determination of the physiologicalrole of 5-hydroxymethylcytosine requires knowledge of where in thegenome 5-hydroxymethylcytosine is located, and we have developed methodsof tagging and precipitating 5-hydroxymethylcytosine for use inchromatin immunoprecipitation.

In T4 phage, all cytosines are hydroxymethylated and subsequentlyglucosylated by the enzymes a-glucosyltransferase (AGT) orβ-glucosyltransferase (BGT) (S. R. Kornberg, et al., J Biol Chem 236,1487 (1961)) (FIG. 30). We have succeeded in producing recombinant BGT.Thus, we can glucosylate sites of hydroxymethylation, and label them viathe mechanism described in FIG. 31. We treated bacterial plasmid and T4phage DNA with periodate, and then used the same aldehyde quantificationmethod described. Only periodate treated T4 phage DNA showed majoraldehyde presence (FIG. 32).

In one embodiment, glucosylation conditions for hydroxymethylated DNAcan be optimized, and the extent of glucosylation can be measured byTLC. Periodate treatment can be optimized and binding to beads withhydrazide moieties can be performed, in order to perform specificpulldown of hydroxymethylated and glucosylated DNA. Such methods can beused, for example, to perform chromatin immunoprecipitation (ChIP) todetermine sites of in vivo genomic hydroxymethylation.

We can determine likely sites of hydroxymethylation by determining thebinding specificities of TET1. We individually expressed domains fromTET proteins and tested their DNA binding properties via EMSAs. OtherCXXC domains have been found to bind unmethylated CpGs, so we expressedthe CXXC domains of TET1 and TET3 to test this specificity. We foundthat the CXXC domains in TET proteins are very positively charged andseem to bind non-specifically to all DNA in vitro. In parallel, weexpressed the CXXC domain of CXXC1, which has been demonstrated to bindto unmethylated CpGs. Under the same conditions used for the TETproteins, this domain bound specifically. We found that the catalyticdomain as a whole and the DSBH domain of TET bind DNA, but again with nospecificity, not even for methylated CpG, which is TET's substrate.Without wishing to be bound by a theory, this may be due to non-specificbinding of DNA to a largely unconserved positively charged region of theDSBH, which is unlikely to actually interact with DNA in vivo because ofits predicted position on the protein.

In one aspect, we can also generate mice in which one or more of the TETfamily genes is genetically ablated (“knock-out mice”), in a lineagespecific or inducible manner (“conditional knock-out mice”). We havesuccessfully generated Tea and Tet2 conditional knock-out mice. We havesuccessfully generated Tet3 conditional KO mice possessing a high degreeof chimerism, and are confirming germline transmission, after which wecan breed mice fully deficient for Tet3 and analyze their phenotype. Wehave shown that Tet3 is expressed in many tissues, so subsequentexperiments on the mice will be guided by phenotype.

Identifying 5-Hydroxymethylcytsoine Using Antibodies to CytosineMethylene Sulfonate

The invention also provides, in part, the use of antibodies to cytosinemethylene sulfonate to identify 5-hydroxymethylcytosine residues ingenomic DNA and for the isolation of such 5-hydroxymethylcytosineresidue comprising DNA by immunoprecipitation, for use, for example, inanalyses of cancer cells.

We have produced a rabbit antiserum specific for cytosine methylenesulfonate, the product of bisulfite treatment of5-hydroxymethylcytosine, and have shown that this antiserum is highlyspecific for, and can be used to quantify, the quantity of5-hydroxymethylcytosine residues present in a sample, such as genomicDNA. We have shown that this rabbit antiserum can be used to demonstratethe inhibition of TET family activity, for example, when TET familyactivity is inhibited by the use of one or more siRNAs specific for TETfamily members, such as TET1 or a combination of TET1 and TET2. Forexample, a bisulfite treated sample, such as a genomic DNA sample, canbe digested with an enzyme, such as Mse1, which cleaves at TTAAsequences. The digested DNA can then be end-labeled with ³²P. Thedigested and labeled DNA can then be incubated with an antibody orantiserum specific for cytosine methylene sulfonate, and immobilized,for example, with anti-rabbit IgG beads. Radiation counts can then bedetermined using scintillation counters, and the radiation count dataused to ascertain the amount of 5-hydroxymethylcytosine present in theDNA. An example of such an assay is shown in FIG. 19.

In another such example, genomic DNA from ES cells, either transfectedwith siRNA sequences specific for one or more TET family members, suchas TET1 or a combination of TET1 and TET2, is bisulfite treated,digested with an enzyme, and labeled and incubated with antiserumspecific for cytosine methylene sulfonate, and the amount of cytosinemethylene sulfonate residues can be quantified against a standard curvegenerated using a known oligo containing cytosine methylene sulfonate.The impact of TET family inhibition on the generation of5-hydroxymethylcytosine can then be compared between the samples. Thepresence of less cytosine methylene sulfonate in a sample treated with aTET family inhibitor, such as an siRNA sequence, is indicative of thespecificity of that siRNA for the TET family member.

In yet another example, the amount of 5-hydroxymethylcytosine in apatient having mutations in one or more TET family members and sufferingfrom a malignant condition, can be ascertained using bisulfite treatmentof DNA obtained from such a patient, where the DNA is then assayed forcytosine methylene sulfonate quantity using the antiserum describedherein, as shown in FIG. 21 and FIG. 33. Genomic DNA was isolated frompatients having the following mutations in TET2, and diagnosed with thecancerous conditions shown in parentheses:

CCF2032-S631stop-somatic (CD3 negative), heterozygous mutation,(MDS/MPD, MDS/MPD-U<5%)CCF2148-S509stop-somatic (CD3 negative), hemizygous mutation, pt withde14q24, (MDS, RARS)CCF2674-ins1310T-somatic (CD3 negative), homozygous mutation, pt withUPD4q, (MDS/MPD, CMML-1)CCF5936-ins318A-homozygous mutation, SNP-A results pending, (CML)

CCF852-WT TET2, (MDS/MPD, CMML-2) CCF4018-WT TET2, (MDS/MPD, CMML-1)

The isolated DNA was then either bisulfite treat or left untreated,digested and labeled with ³²P. The bisulfite treated DNA was incubatedwith antiserum specific for cytosine methylene sulfonate, while theuntreated DNA was incubated with antibodies specific for5-hydroxymethylcytosine to immunoprecipitate the genomic regions having5-hydroxymethylcytosine. The immunoprecipitated DNA was then run on gelsas dot blots and analyzed using phosphoimaging, compared to serialdilutions of a standard control having a known quality of cytosinemethylene sulfonate or 5-hydroxymethylcytosine, such as cytosinemethylene sulfonate or 5-hydroxymethylcytosine oligonucleotides. In theexamples shown in FIG. 21 and FIG. 33, we show that patients CCF2148 andCCF2674 have significantly less 5-hydroxymethylcytosine, when comparedto patients CCF852 and CCF4018, having wild-type TET2. This demonstratedthat the somatic mutations in TET2 in patients CCF2148 and CCF2674directly are functional and directly impact TET2-mediated conversion of5-methylcytosine to 5-hydroxymethylcytosine.

Role of TET Proteins in Leukemia

It has been observed that there are a high frequency of TET2, but notTET1 and TET3, mutations in various myeloid cancers, including MDS, MPD,AML, secondary AML, systemic mastocytosis, and CMML. It has been shownthat TET2 is the most commonly mutated gene in MDS, and thus serves as avery useful prognostic marker.

TET2 mutations are present in both multipotent and committed progenitorcells from MPD patients. TET2 mutations have been found in patients withboth JAK2 V617F-positive and -negative MPD, and these mutations havebeen proposed to be a pre-JAK2 event. It has been shown that there is anenrichment of TET2 missense mutations, without frame shift or nonsensemutations, or deletions, in two conserved regions that cover thecatalytic core of TET proteins that contain C and D domains, as shown inFIG. 34. We postulate that these numerous heterozygous missensemutations have dominant negative roles to promote malignanttransformation.

We have shown that TET1 and TET2 have differential expression patternswhen both bone marrow and thymic hematopoietic progenitor cell subsetsare examined. As shown in FIG. 35, TET2 is expressed most highly in theGr-1⁻Mac-1⁺ myeloid lineage bone marrow cells; pre-B, immature B, andmature B lymphoid lineage bone marrow cells; and in DN1, DP, CD4+SP, andCD8+SP thymic lymphoid lineage cells. As shown in FIG. 36, TET1 isexpressed most highly in DP, CD4+SP, and CD8+SP thymic lymphoid lineagecells.

In order to determine the role of TET2 in leukemia and malignanttransformations, and the role of cooperation between TET2 and JAK2mutations, Lin⁻c-kit⁺ cells bone marrow cells can be isolated andtransduced with the various combinations of retroviral vectors: LMP-GFPand MSCV-IRES-hCD4; LMP-shTet2-GFP and MSCV-IRES-hCD4; LMP-GFP andMSCV-JAK2 V617F-IRES-hCD4; and LMP-shTet2-GFP and MSCV-JAK2V617F-IRES-hCD4, where shTet2 is an shRNA specific for Tet2. Cells canthen be sorted on the basis of GFP and hCD4 expression, using techniquesknown to one of skill in the art. The isolated cells can then becompared for their effects on growth kinetics, transforming activity,and in vivo tumorigenesis. For example, isolated cells can betransferred into lethally irradiated mice to investigate in vivotumorigenesis capacities.

As shown in FIG. 37, expression of the shTet2#3 sequence results indecreased expression of Tet2 in c-kit⁺ bone marrow cells, as assessed byquantitative PCR analysis. Further, we show that expression of theshTet2#3 sequence results in decreased protein expression, using a Myctagged Tet2 protein (FIG. 37).

Without wishing to be bound or limited by a theory, we postulate thatthe TET family of epigenetic modulators serve as potential linkersbetween energy metabolism and tumor suppression.Isocitratedehydrogenases (IDHs) are metabolic enzymes in the TCA cycleand catalyze the oxidative decarboxylation of isocitrate toα-ketoglutarate (α-KG). IDHs can be classified into two groups(depending on the types of e-acceptor): (1) NAD+-dependentisocitratedehydrogenases, such as IDH3A, IDH3B, IDH3G, which formheterotetramer α2βγ, play an irreversible step of TCA cycle, and arefound in the mitochondrial matrix; and (2) NDAP+-dependentisocitratedehydrogenases, such as IDH1, IDH2, which form homodimers, areinvolved in NADPH regeneration for anabolic pathways, and can be foundin the mitochondrial matrix (IDH2) or cytoplasm/peroxisome (IDH1). It isknown that recurrent somatic, (dominant negative) mutations occur atR132 of IDH1 in glioblastoma multiform (GBM: ˜12%) and myeloid leukemia.Without wishing to be bound by a theory, we postulate that the R132mutation impairs IDH1 homodimer formation, resulting in impaired α-KGgeneration, which results in TET family inactivation and consequenttumoriegenesis, as diagrammed in FIG. 38.

Detection of Radiolabled Glucose Added to 5-Hydroxymethylcytosine

DNA is incubated with alpha-glucosyltransferase orbeta-glucosyltransferase in the presence of radiolabeled uridinediphosphate (UDP) glucose, either UDP-14C-glucose or UDP-3H-glucose, andthe DNA is purified. If 5-hydroxymethylcytosine is present in the DNA,the radiolabel is isolated with the DNA and detected by liquidscintillation counting or autoradiography or other means. In someembodiments, the DNA is first contacted with one or more catalyticallyactive TET family enzymes, functional TET family derivatives, or TETcatalytic fragments to convert 5-methylcytosine to5-hydroxymethylcytosine.

Detection of Non-Radiolabled Glucose Added to 5-Hydroxymethylcytosine

Non-radioactive UDP glucose is used as a substrate and the resultingalpha-glucose-5-hydroxymethylcytosine orbeta-glucose-5-hydroxymethylcytosine is detected by further chemicalreaction or protein binding. Examples of a protein include an antibodyor lectin that recognizes alpha-glucose-5-hydroxymethylcytosine orbeta-glucose-5-hydroxymethylcytosine or an enzyme, such as hexokinase orbeta-glucosyl-alpha-glucosyl-transferase, that adds furthermodifications to the alpha-glucose-5-hydroxymethylcytosine orbeta-glucose-5-hydroxymethylcytosine. In some embodiments, the DNA isfirst contacted with one or more catalytically active TET familyenzymes, functional TET family derivatives, or TET catalytic fragmentsto convert 5-methylcytosine to 5-hydroxymethylcytosine.

Detection of Methylcytosine and 5-Hydroxymethylcytosine Using CovalentTrapping

A UDP glucose analog that fosters covalent trapping of the covalentenzyme-DNA intermediate is used as a substrate, such that when DNA isincubated with alpha-glucosyltransferase or beta-glucosyltransferase,any 5-hydroxymethylcytosine containing DNA is tagged withalpha-glucosyltransferase or beta-glucosyltransferase. The DNA eitherhas naturally occurring 5-hydroxymethylcytosine residues or is contactedwith one or more catalytically active TET family enzymes, functional TETfamily derivatives, or TET catalytic fragments to convert5-methylcytosine to 5-hydroxymethylcytosine. Also, thealpha-glucosyltransferase or beta-glucosyltransferase are created withone or more protein or non-protein tags to facilitate detection orisolation of the covalently linked enzyme-DNA complexes.

Modification and Detection of Methylcytosine and 5-Hydroxymethylcytosine

Naturally-occurring 5-hydroxymethylcytosine or that created byconversion of 5-methylcytosine in nucleic acids, such as DNA, isconverted to glucose-5-hydroxymethylcytosine withalpha-glucosyltransferase or beta-glucosyltransferase and is furtherglycosylated using beta-glucosyl-alpha-glucosyl-transferase. Thebeta-glucosyl-alpha-glucosyl-transferase adds radioactively labeledglucose in UDPG to glucose-5-hydroxymethylcytosine. Alternatively,beta-glucosyl-alpha-glucosyl-transferase is used with substrates otherthan UDPG, such as UDP-2-deoxy-2-fluoro-glucose, to covalently trap theenzyme with its substrates. This will allow tagging of methylcytosine or5-hydroxymethylcytosine with a protein.Beta-glucosyl-alpha-glucosyl-transferase is also created with severalprotein or non-protein tags to facilitate detection or isolation of thecovalently linked beta-glucosyl-alpha-glucosyl-transferaseglucose-5-hydroxymethylcytosine DNA complex.

The gentibiosyl (gentiobiosyl) residue in gentibiose-containing5-hydroxymethylcytosine, which results from addition of a second glucoseto glucose-5-hydroxymethylcytosine DNA bybeta-glucosyl-alpha-glucosyl-transferase is detected using non-covalentmethods. Detection methods include exploiting the binding of gentibiosylresidues to proteins with an affinity for this residue, such as (1)antibodies specific to gentibiose-containing 5-hydroxymethylcytosine or(2) lectins with affinity to gentibiosyl, such as Musa acuminata lectin(BanLec).

Lectins and antibodies further modified with several tags such as biotinor beads are used for solid-phase purification of gentibiose-containing5-hydroxymethylcytosine containing DNA. Lectins and antibodies modifiedwith gold or fluorescent tags are used for electron microscopic orimmunofluorescent detection, respectively, of gentibiose-containing5-hydroxymethylcytosine containing DNA.

If desired, covalent linkages of glucose and gentibiosyl modificationsto gentibiose-containing 5-hydroxymethylcytosine and glucose-containing5-hydroxymethylcytosine are reversed by chemical means or by enzymessuch as alpha- and beta-glucosidases, thus liberating the5-hydroxymethylcytosine containing DNA for further downstreamapplications. One example of these methods is shown in FIG. 4.

To detect 5-hydroxymethylcytosine, the 5-hydroxymethyl residue of5-hydroxymethylcytosine is converted to the 5-hydroxymethylenesulfonateresidue by sodium hydrogen sulfite, and then detected with antibodies tothe modified residue.

Downstream applications that utilize the covalently and non-covalentlytagged methylcytosine and 5-hydroxymethylcytosine include: (i) detectionof methylcytosine and 5-hydroxymethylcytosine in cells or tissuesdirectly by fluorescence or electron microscopy; (ii) detection ofmethylcytosine and 5-hydroxymethylcytosine by assays including blottingor linked enzyme mediated substrate conversion with radioactive,colorimetric, luminescent or fluorescent detection and (iii) separationof the tagged DNA away from untagged DNA by enzymatic, chemical ormechanical treatments, and fractionation of either the tagged oruntagged DNA by precipitation with beads, magnetic means, fluorescentsorting, or other means; followed by application to whole genomeanalyses such as microarray hybridization and high-throughput sequencing

Diagnostic Methods for Assessing Global Methylcytosine and5-hydroxymethylcytosine Levels

Global level of methylcytosine and/or 5-hydroxymethylcytosine, i.e., the“methylome” or “hydroxymethylome” signatures in diseased tissue samples,such as bone marrow from patients with MDS, MPD, AML, are assessed toaid in disease diagnosis of disease to permits disease classifications,risk stratify patients, direct therapy, and monitor responses totherapy.

Genetic Tests for Methylcytosine and 5-Hydroxymethylcytosine Levels

Levels of methylcytosine and/or 5-hydroxymethylcytosine are determinedin cells from family members of people affected with a disease, todetermine whether they might harbor the disease. 5-hydroxymethylcytosinelevels are determined, in a non-limiting example, in the CD34+hematopoietic cells of a family member of someone with MDS, MPD, AML todetermine whether there is a familial predisposition.

Kits and Methods for Detection of Methylcytosine and5-Hydroxymethylcytosine in Genomes

Whole genomic DNA is mixed with control DNA, and sheared to a desiredsize (average around 200 bp). The DNA is subjected to one or morecatalytically active TET family enzymes, functional TET familyderivatives, or TET catalytic fragments mediated conversion ofmethylcytosine to 5-hydroxymethylcytosine in the appropriate buffer. DNAis purified on spin column. 5-hydroxymethylcytosine converted DNA isthen treated simultaneously with alpha-glucosyltransferase orbeta-glucosyltransferase and beta-glucosyl-alpha-glucosyl-transferaseenzyme in a UDPG containing buffer. DNA is purified on spin column.Biotinylated BanLec is rocked with gentibiose-containing5-hydroxymethylcytosine converted DNA. Streptavidin agarose beads willbe added. Streptavidin-biotin-BanLec-gentibiose-containing5-hydroxymethylcytosin-containing DNA complexes are precipitated andwashed in buffer, and supernatant containing unmethylated cytosinecontaining DNA is saved for analysis. The beads are treated withmethyl-alpha-mannoside to release the lectin, and glucosidases to cleavethe gentiobiosyl residue, and solute is purified over DNA spin column.The purified DNA is subjected to further analysis, such as microarray,direct sequencing, or PCR based assays.

An internal standard of lambda DNA carrying cytosine methylation atBamHI residues is used to determine efficiency and specificity of5-hydroxymethylcytosine detection using PCR primer pairs flanking andnot flanking BamHI residues in the lambda genome.

The detection of naturally occurring 5-hydroxymethylcytosine in genomesis performed the same as above but without the conversion ofmethylcytosine to 5-hydroxymethylcytosine by one or more catalyticallyactive TET family enzymes, functional TET family derivatives, or TETcatalytic fragments.

The kit components comprise: one or more catalytically active TET familyenzymes, functional TET family derivatives, or TET catalytic fragments;one or more alpha glucosyltransferases, beta-glucosyltransferases, orbeta-glucosyl-alpha-glucosyl-transferases; biotinylated BanLec;streptavidin agarose beads; methyl-alpha-mannoside; alpha-glucosidaseand beta-glucosidase; appropriate buffers, substrate solutions, and DNApurification spin columns and an internal standard further comprisinglambda DNA cytosine methylated with BamHI methyltransferase and PCRprimers.

The present invention can be defined in any of the following numberedparagraphs:

1. A method for improving the generation of stable human Foxp3+ T cells,the method comprising contacting with or delivering to a human T cell aneffective 5-methylcytosine to 5-hydroxymethylcytosine converting amountof at least one catalytically active TET family enzyme, functional TETfamily derivative, TET catalytically active fragment, or combinationthereof.2. The method of paragraph 1, wherein the catalytically active TETfamily enzyme is selected from the group consisting of TET1, TET2, TET3,and CXXC4.3. The method of paragraph 1, wherein the human T cell is a purifiedhuman CD4+ T cell.4. The method of paragraph 1, further comprising generating stable humanFoxp3+ T cells by contacting the human T cell with a composition atleast one cytokine, growth factor, or activating reagent.5. The method of paragraph 5, wherein said composition comprises TGF-0.6. A method for improving efficiency or rate with which inducedpluripotent stem (iPS) cells are produced from somatic cells, the methodcomprising contacting with, or delivering to, a somatic cell aneffective 5-methylcytosine to 5-hydroxymethylcytosine converting amountof at least one catalytically active TET family enzyme, functional TETfamily derivative, TET catalytically active thereof, or combinationthereof.7. The method of paragraph 6, wherein the catalytically active TETfamily enzyme is selected from the group consisting of TET1, TET2, TET3,and CXXC4.8. The method of paragraph 6, wherein the catalytically active TETfamily enzyme is TET1 or TET2.9. The method of paragraph 6, further comprising contact with ordelivering to the somatic cell an effective amount of a TET familyinhibitor.10. The method of paragraph 9, wherein the TET family inhibitor is aTET3 inhibitor.11. The method of paragraph 6, further comprising inducing iPS cellproduction by contacting the adult somatic cell with or delivering tosaid adult somatic cell a combination of nucleic acid sequences encodingOct-4, Sox2, c-MYC, and Klf4.12. The method of paragraph 11, wherein the combination of nucleic acidsequences encoding Oct-4, Sox2, c-MYC, and Klf4 are delivered in a viralvector, selected from the group consisting of an adenoviral vector, alentiviral vector, and a retroviral vector.13. The method of paragraph 6, wherein the somatic cell is a fibroblast.14. A method for improving efficiency of cloning mammals by nucleartransfer or nuclear transplantation, the method comprising contacting anucleus extracted from a cell to be cloned with an effective5-methylcytosine to 5-hydroxymethylcytosine hydroxylating amount of atleast one catalytically active TET family enzyme, functional TET familyderivative, TET catalytically active fragment, or combination thereof,during a nuclear transfer protocol.15. The method of paragraph 14, wherein the catalytically active TETfamily enzyme is selected from the group consisting of TET1, TET2, TET3,and CXXC4.16. The method of paragraph 14, wherein the catalytically active TETfamily enzyme is TET1 or TET2.17. The method of paragraph 14, further comprising contact with ordelivering to the somatic cell an effective amount of a TET familyinhibitor.18. The method of paragraph 17, wherein the TET family inhibitor is aTET3 inhibitor.19. A method for detecting a 5-hydroxymethylcytosine nucleotide in abiological sample, the method comprising contacting a biological samplewith a detectably labeled antibody or an antigen binding portionthereof, a labeled intrabody, or a labeled protein, that specificallybinds to 5-hydroxymethylcytosine, and detecting the amount of boundlabel, wherein the presence of the bound label is indicative of the5-methylcytosine being converted to 5-hydroxymethylcytosine.20. A kit for modulating gene transcription via hydroxylation of5-methylcytosine to 5-hydroxymethylcytosine, the kit comprising thefollowing separate components:(a) at least one or more catalytically active TET family enzyme,functional TET family derivative, TET catalytically active fragment, orcombination thereof, or nucleic acid molecule that comprises a sequenceencoding at least one catalytically active TET family enzyme, functionalTET family derivative, TET catalytically active fragment, or combinationthereof, in an appropriate buffer or solution; and(b) packaging materials and instructions therein to use said kit tohydroxylate 5-methylcytosine to 5-hydroxymethylcytosine, for thepurposes of modulating gene transcription.21. The kit of paragraph 20, wherein the catalytically active TET familyenzymes are selected from the group consisting of TET1, TET2, TET3, andCXXC4.22. The kit of paragraph 20, further comprising at least one cytokine,growth factor, activating reagent, or combination thereof, for thepurposes of generating stable human Foxp3+ regulatory T cells.23. The kit of paragraph 22, wherein the composition comprises TGF-0.24. The kit of paragraph 20, further comprising at least one nucleicacid sequence encoding Oct-4, Sox2, c-MYC, and Klf4, to be contactedwith or delivered to a somatic cell for the purposes of improving theefficiency and rate of induced pluripotent stem cell production.25. The kit of paragraph 24, wherein the nucleic acid sequences encodingOct-4, Sox2, c-MYC, and Klf4 are delivered in a viral vector selectedfrom the group consisting of an adenoviral vector, a lentiviral vector,and a retroviral vector.26. The kit of paragraph 20, further comprising at least one reagentsuitable for the detection of 5-hydroxymethylcytosine.27. The kit of paragraph 26, wherein the reagent suitable for thedetection of 5-hydroxymethylcytosine is an antibody or anantigen-binding portion thereof, an intrabody, or a protein, thatspecifically binds to 5-hydroxymethylcytosine.28. The kit of paragraph 26, wherein said reagent suitable for thedetection of 5-hydroxymethylcytosine is specific forcytosine-5-methylsulfonate.29. A method for improving stem cell therapies, the method comprisingcontacting with, or delivering to, a stem cell an effective5-methylcytosine to 5-hydroxymethylcytosine converting amount of atleast one catalytically active TET family enzyme, functional TET familyderivative, TET catalytically active fragment thereof, or combinationthereof, or at least one nucleic acid molecule that comprises a sequenceencoding at least one catalytically active TET family enzyme, functionalTET family derivative, TET catalytically active fragment, or combinationthereof.30. The method of paragraph 29, wherein the catalytically active TETfamily enzyme is selected from the group consisting of TET1, TET2, TET3,and CXXC4.31. A method for treating an individual with or at risk for cancer, themethod comprising administering to an individual with or at risk forcancer an effective amount of an agent that specifically modulateshydroxylase activity of at least one catalytically active TET familyenzyme, functional TET family derivative, TET catalytically activefragment, or combination thereof involved in transforming5-methylcytosine into 5-hydroxymethylcytosine.32. The method of paragraph 31, wherein the catalytically active TETfamily enzyme is selected from the group consisting of TET1, TET2, TET3,and CXXC4.33. The method of paragraph 31, wherein the agent that specificallymodulates hydroxylase activity is an inhibitor.34. The method of paragraph 31, wherein the agent that specificallymodulates hydroxylase activity is an activator.35. The method of paragraph 31, wherein the cancer is a leukemia.36. The method of paragraph 35, wherein the leukemia is an acute myeloidleukemia comprising the t(10:11)(q22:q23) Mixed Lineage Leukemiatranslocation of TET1.37. A method for screening for an agent with TET family enzymemodulating activity, the method comprising the steps of:a) providing a cell comprising at least one TET family enzyme,functional TET family derivative, TET catalytically active fragment,recombinant TET family enzyme, or combination thereof;b) contacting said cell with a test agent, thereby creating a testsample; andc) comparing the relative levels of 5-hydroxymethylated cytosine incells expressing the catalytically active TET family enzyme, functionalTET family derivative, TET catalytically active fragment, recombinantTET family enzyme, or combination thereof, in the test sample with thelevel expressed in a control sample; and(d) determining whether or not the test agent increases or decreases thelevel of 5-hydroxymethylated cytosine, wherein a statisticallysignificant decrease in the level of 5-hydroxymethylated cytosineindicates the agent is an inhibitor, and a statistically significantincrease in the level of 5-hydroxymethylated cytosine indicates theagent is an activator.38. The method of paragraph 37, wherein the catalytically active TETfamily enzyme is selected from the group consisting of TET1, TET2, TET3,and CXXC4.39. The method of any of the preceding paragraphs, wherein thefunctional TET family derivative comprises SEQ ID NO: 1.40. The method of any of the preceding paragraphs, wherein the TETfamily catalytically active fragment comprises SEQ ID NO: 2, SEQ ID NO:3, SEQ ID NO: 4, or SEQ ID NO: 5.41. A method for covalent tagging 5-hydroxymethylcytosine in a nucleicacid, the method comprising contacting a nucleic acid molecule with anenzyme that adds one or more glucose molecules to a5-hydroxymethylcytosine residue to generateglucosylated-5-hydroxymethylcytosine orgentibiose-containing-5-hydroxymethylcytosine, wherein the enzyme is analpha-glucosyltransferase, a beta-glucosyltransferase, or abeta-glucosyl-alpha-glucosyl-transferase.42. The method of paragraph 41, wherein the 5-hydroxymethylcytosine isnaturally occurring.43. The method of paragraph 41, further comprising the step of firstcontacting said nucleic acid with at least one catalytically active TETfamily enzyme, functional TET family derivative, TET catalyticallyactive fragment thereof, or combination thereof, thereby converting5-methylcytosine to hydroxymethylcytosine.44. The method of paragraph 41, wherein the alpha-glucosyltransferase isencoded by a bacteriophage selected from the group consisting of T2, T4,and T6 bacteriophages.45. The method of paragraph 41, wherein the beta-glucosyltransferase isencoded by a bacteriophage selected from T4 bacteriophages.46. The method of paragraph 41, wherein thebeta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophageselected from the group consisting of T2 and T6 bacteriophages.47. The method of paragraph 41, wherein the nucleic acid is contacted invitro, in a cell, or in vivo.48. A method for detecting 5-hydroxymethylcytosine in a nucleic acid,the method comprising contacting a nucleic acid with an enzyme thatutilizes labeled glucose or glucose-derivative donor substrates to addat least one labeled glucose molecules or glucose-derivatives to a5-hydroxymethylcytosine residue to generateglucosylated-5-hydroxymethylcytosine orgentibiose-containing-5-hydroxymethylcytosine, wherein the enzyme is analpha-glucosyltransferase, a beta-glucosyltransferase, or abeta-glucosyl-alpha-glucosyl-transferase.49. The method of paragraph 48, wherein the glucose orglucose-derivative donor substrate is a uridine diphosphate glucose.50. The method of paragraph 48, wherein the labeled glucose orglucose-derivative donor substrates is radioactively labeled.51. The method of paragraph 50, wherein the radioactive label is ¹⁴C or³H.52. The method of paragraph 48, wherein the 5-hydroxymethylcytosine isnaturally occurring.53. The method of paragraph 53, further comprising the step of firstcontacting said nucleic acid with at least one catalytically active TETfamily enzyme, functional TET family derivative, TET catalyticallyactive fragment, or combination thereof, thereby converting5-methylcytosine to 5-hydroxymethylcytosine.54. The method of paragraph 48, wherein the alpha-glucosyltransferase isencoded by a bacteriophage selected from the group consisting of T2, T4,and T6 bacteriophages.55. The method of paragraph 48, wherein the beta-glucosyltransferase isencoded by a bacteriophage selected from T4 bacteriophages.56. The method of paragraph 48, wherein thebeta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophageselected from the group consisting of T2 and T6 bacteriophages.57. The method of paragraph 48, wherein the nucleic acid is contacted invitro, in a cell, or in vivo.58. A method for detecting 5-hydroxymethylcytosine in a nucleic acid,the method comprising contacting the covalently tagged5-hydroxymethylcytosine of claim 41 with a protein that recognizes aglucose molecule, glucose-derivative or gentibiosyl molecule.59. The method of paragraph 58, wherein the protein recognizes only theglucose molecule, glucose-derivative, or gentibiosyl.60. The method of paragraph 58, wherein the protein recognizes theglucose molecule, glucose-derivative, or gentibiosyl only in the contextof 5-hydroxymethylcytosine.61. The method of paragraph 58, wherein the protein is a lectin.62. The method of paragraph 61, wherein the lectin is Musa acuminatalectin.63. The method of paragraph 58, wherein the protein is a antibody orantigen-binding fragment thereof.64. The method of paragraph 63, wherein the antibody or antigen-bindingfragment thereof is modified with a tag.65. The method of paragraph 64, wherein the tag is a biotin molecule, abead, a gold particle, or a fluorescent molecule.66. The method of paragraph 58, wherein the protein is an enzyme.67. The method of paragraph 66, wherein the enzyme is hexokinase orbeta-glucosyl-alpha-glucosyl-transferase.68. A method for detecting 5-hydroxymethylcytosine in a nucleic acid,the method comprising contacting a nucleic acid with an enzyme andutilizing glucose or glucose-derivative donor substrates that trapcovalent enzyme-DNA intermediates to detect 5-hydroxymethylcytosineresidues, wherein the enzyme is an alpha-glucosyltransferase, abeta-glucosyltransferase, or a beta-glucosyl-alpha-glucosyl-transferase.69. The method of paragraph 68, wherein the glucose donor substrate is auridine diphosphate glucose analog.70. The method of paragraph 69, wherein the uridine diphosphate glucoseanalog is undine-2-deoxy-2-fluoro-glucose.71. The method of paragraph 68, wherein the 5-hydroxymethylcytosine isnaturally occurring.72. The method of paragraph 68, further comprising the step of firstcontacting said nucleic acid with at least one catalytically active TETfamily enzyme, functional TET family derivative, TET catalyticallyactive fragment, or combination thereof, thereby converting5-methylcytosine to 5-hydroxymethylcytosine.73. The method of paragraph 68, wherein the enzyme is tagged.74. The method of paragraph 68, wherein the alpha-glucosyltransferase isencoded by a bacteriophage selected from the group consisting of T2, T4,and T6 bacteriophages.75. The method of paragraph 68, wherein the beta-glucosyltransferase isencoded by a bacteriophage selected from T4 bacteriophages.76. The method of paragraph 68, wherein thebeta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophageselected from the group consisting of T2 and T6 bacteriophages.77. The method of paragraph 68, wherein the nucleic acid is contacted invitro, in a cell, or in vivo.78. An method to detect 5-hydroxymethylcytosine in a nucleic acid, themethod comprising contacting a nucleic acid with sodium hydrogen sulfiteto convert a 5-hydroxymethylcytosine residue in a nucleic acid to acytosine-5-methylsulfonate, and contacting the sodium hydrogen sulfitecontacted nucleic acid with a protein specific forcytosine-5-methylsulfonate.79. The method of paragraph 78, wherein the protein is an antibody orantigen-binding fragment thereof, an enzyme, or an intrabody.80. The method of paragraph 79, wherein the antibody comprises anantiserum.81. The method of paragraph 79, wherein the antibody or antigen-bindingfragment thereof, enzyme, or intrabody is modified with a tag.82. The method of paragraph 81, wherein the tag is a biotin molecule, abead, a gold particle, or a fluorescent molecule.83. The method of paragraph 78, further comprising isolating the5-hydroxymethylcytosine residue containing nucleic acid with the proteinspecific for cytosine-5-methylsulfonate 84. The method of paragraph 78,wherein the nucleic acid is in vitro, in a cell, or in vivo.85. A kit for the detection and purification of methylcytosine and5-hydroxymethylcytosine, the kit comprising:(a) one or more catalytically active TET family enzymes, functional TETfamily derivatives, or TET catalytically active fragments thereof forthe conversion of methylcytosine to 5-hydroxymethylcytosine;(b) one or more enzymes encoded by bacteriophages of the “T even”family;(c) one or more glucose or glucose-derivative donor substrates;(d) one or more proteins to detect glucose or glucose-derivativemodified nucleotides;(e) standard DNA purification columns, buffers, and substrate solutions;and(f) packaging materials and instructions therein to use said kits.86. The kit of paragraph 85, wherein the enzyme encoded bybacteriophages of the “T even” family is selected from the groupconsisting of alpha-glucosyltransferases, beta-glucosyltransferases, andbeta-glucosyl-alpha-glucosyl-transferases.87. The kit of paragraph 86, wherein the alpha-glucosyltransferase isencoded by a bacteriophage selected from the group consisting of T2, T4,and T6 bacteriophages.88. The kit of paragraph 86, wherein the beta-glucosyltransferase isencoded by a bacteriophage selected from T4 bacteriophages.89. The kit of paragraph 86, wherein thebeta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophageselected from the group consisting of T2 and T6 bacteriophages.90. The kit of paragraph 85, wherein the glucose or glucose-derivativedonor substrate is uridine diphosphate glucose (UDPG).91. The kit of paragraph 90, wherein the glucose or glucose-derivativedonor substrate is radiolabeled.92. The kit of paragraph 91, wherein the uridine diphosphate glucose isradiolabeled with 14C or 3H.93. The kit of paragraph 85, wherein the protein that detects glucose orglucose-derivative modified nucleotides is selected from a groupcomprising a lectin, an antibody or antigen-binding fragment thereof, oran enzyme.94. The kit of paragraph 85, wherein the protein recognizes only theglucose or glucose-derivative.95. The kit of paragraph 85, wherein the protein recognizes the glucoseor glucose-derivative only in the context of 5-hydroxymethylcytosine.96. The kit of paragraph 93, wherein the antibody or antigen-bindingfragment thereof is modified with at least one tag.97. The kit of paragraph 96, wherein the tag is a biotin molecule, abead, a gold particle, or a fluorescent molecule.98. The kit of paragraph 93, wherein the enzyme is a hexokinase or abeta-glucosyl-alpha-glucosyl-transferase. 99. The kit of paragraph 93,wherein the lectin is Musa acuminata lectin (BanLec).100. The kit of paragraph 99, wherein the lectin is modified with a goldparticle or fluorescent tag.101. A method for diagnosing a myelodysplastic syndrome, amyeloproliferative disorder, acute myelogenous leukemia, systemicmastocytosis, or chronic myelomonocytic leukemia in an individual inneed thereof, the method comprising the steps of(i) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, in a tissue or cell sample from an individualin need thereof, and(ii) comparing the level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof in the tissue or cell sample from theindividual with a level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, from a normal control sample, wherein adifference in the level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, between the sample from the individual in needand the normal control sample is indicative of the individual having amyelodysplastic syndrome, a myeloproliferative disorder, acutemyelogenous leukemia, systemic mastocytosis, or chronic myelomonocyticleukemia.102. The method of paragraph 101, further comprising a step of comparingthe level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or acombination thereof, in a tissue or cell sample of the individual to alevel of 5-methylcytsosine, 5-hydroxymethylcytsosine, or a combinationthereof, in at least one sample from a diseased tissue or a diseasedcell, wherein if the level of 5-methylcytsosine,5-hydroxymethylcytsosine, or a combination thereof, in the tissue orcell sample from the individual in need is similar to the level of5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof,from at least one of the samples from the diseased tissue or diseasedcell then the individual is diagnosed with a myelodysplastic syndrome, amyeloproliferative disorder, acute myelogenous leukemia, systemicmastocytosis, or chronic myelomonocytic leukemia.103. A method for monitoring a disease progression or an effect of atherapy on a myelodysplastic syndrome, a myeloproliferative disorder,acute myelogenous leukemia, systemic mastocytosis, or chronicmyelomonocytic leukemia, the method comprising the steps of(i) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, in a tissue or a cell sample from anindividual in need thereof and establishing a baseline level of5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof,in the tissue or cell sample from the individual;(ii) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, in a tissue or cell sample from the individualat least one time following the establishment of the baseline level of5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof,thereby establishing at least one follow-up level of 5-methylcytsosine,5-hydroxymethylcytsosine, or a combination thereof, wherein a differencein the follow-up level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, relative to the baseline level of5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof,is indicative of the progression of, or effect of a therapy on, amyelodysplastic syndrome, a myeloproliferative disorder, acutemyelogenous leukemia, systemic mastocytosis, or chronic myelomonocyticleukemia in the individual.104. A method for determining familial predisposition to amyelodysplastic syndrome, a myeloproliferative disorder, acutemyelogenous leukemia, systemic mastocytosis, or chronic myelomonocyticleukemia in an individual in need thereof, the method comprising (i)determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or acombination thereof in CD34+ cells from an individual in need thereof,(ii) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, in CD34+ cells from a family member of theindividual, wherein the family member is affected with a myelodysplasticsyndrome, a myeloproliferative disorder, acute myelogenous leukemia,systemic mastocytosis, or chronic myelomonocytic leukemia, and (iii)comparing the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or acombination thereof in the CD34+ cells from the individual in needthereof with the level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, in the CD34+ cells from the affected familymember, wherein an increase in the level of 5-methylcytsosine,5-hydroxymethylcytsosine, or a combination thereof, in the individualrelative to the 5-methylcytsosine, 5-hydroxymethylcytsosine, or acombination thereof level in the affected family member is indicative ofthe individual being predisposed to a myelodysplastic syndrome, amyeloproliferative disorder, acute myelogenous leukemia, systemicmastocytosis, or chronic myelomonocytic leukemia.105. A method for determining familial predisposition to amyelodysplastic syndrome, a myeloproliferative disorder, acutemyelogenous leukemia, systemic mastocytosis, or chronic myelomonocyticleukemia in an individual in need thereof, the method comprising (i)determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or acombination thereof in CD34+ cells from an individual in need thereof,(ii) determining a level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, in CD34+ cells from a family member of theindividual, wherein the family member is affected with a myelodysplasticsyndrome, a myeloproliferative disorder, acute myelogenous leukemia,systemic mastocytosis, or chronic myelomonocytic leukemia, and (iii)comparing the level of 5-methylcytsosine, 5-hydroxymethylcytsosine, or acombination thereof in the CD34+ cells from the individual in needthereof with the level of 5-methylcytsosine, 5-hydroxymethylcytsosine,or a combination thereof, in the CD34+ cells from the affected familymember, wherein a decrease in the level of 5-methylcytsosine,5-hydroxymethylcytsosine, or a combination thereof, in the individualrelative to the 5-methylcytsosine, 5-hydroxymethylcytsosine, or acombination thereof level in the affected family member is indicative ofthe individual being predisposed to a myelodysplastic syndrome, amyeloproliferative disorder, acute myelogenous leukemia, systemicmastocytosis, or chronic myelomonocytic leukemia.106. The method as in any of paragraphs 101-105, wherein the5-methylcytsosine, 5-hydroxymethylcytsosine, or a combination thereof,level is determined using an assay to detect cytosine-5-methylsulfonate107. A kit for the detection and purification of5-hydroxymethylcytosine, the kit comprising:(a) at least one catalytically active TET family enzyme, functional TETfamily derivative, TET catalytically active fragment, or combinationthereof for the conversion of 5-methylcytosine to5-hydroxymethylcytosine;(b) sodium bisulfite;(c) at least one protein to detect sodium bisulfite treated nucleotides;(e) standard DNA purification columns, buffers, and substrate solutions;and(f) packaging materials and instructions therein to use said kits.108. The kit of paragraph 107, wherein the protein that recognizessodium bisulfite treated nucleotide is specific forcytosine-5-methylsulfonate.109. The kit of paragraph 107, wherein the protein that detects sodiumbisulfite treated nucleotides is an antibody or antigen-binding fragmentthereof, an intrabody, or an enzyme.110. The kit of paragraph 107, wherein the antibody or antigen-bindingfragment thereof, intrabody, or enzyme is modified with at least onetag.111. The kit of paragraph 110, wherein the tag is a biotin molecule, abead, a gold particle, or a fluorescent molecule.112. The use of at least one catalytically active TET family enzyme,functional TET family derivative, TET catalytically active fragment, orcombination thereof, in the manufacture of a medicament for improvingthe generation of stable human Foxp3+ T cells, wherein an effectiveamount of o at least one catalytically active TET family enzyme,functional TET family derivative, TET catalytically active fragment, orcombination thereof, is contacted with, or delivered to, a human T cellto improve the generation of stable human Foxp3+ T cells.113. The use of paragraph 112, wherein the human T cell is a purifiedhuman CD4+ T cell. 114. The use of paragraph 112, further comprisinggenerating stable human Foxp3+ T cells by contacting the human T cellwith a composition comprising at least one cytokine, growth factor, oractivating reagent.115. The use of paragraph 114, wherein said composition comprises TGF-0.116. The use of at least one catalytically active TET family enzyme,functional TET family derivative, TET catalytically active fragment, orcombination thereof, in the manufacture of a medicament for improvingefficiency or rate with which an induced pluripotent stem (iPS) cell isproduced from a somatic cell, wherein an effective amount of at leastone catalytically active TET family enzyme, functional TET familyderivative, TET catalytically active fragment, or combination thereof,is contacted with, or delivered to, a somatic cell to improve theefficiency or rate with which an induced pluripotent stem (iPS) cell isproduced.117. The use of paragraph 116, further comprising inducing iPS cellproduction by contacting with or delivering to the somatic cell at leastone of a nucleic acid sequence encoding Oct-4, Sox2, c-MYC, or Klf4, ora combination thereof.118. The use of paragraph 117, wherein the at least one nucleic acidsequence encoding Oct-4, Sox2, c-MYC, or Klf4 is delivered in a viralvector, selected from the group consisting of an adenoviral vector, alentiviral vector, and a retroviral vector.119. The use of paragraph 116, further comprising contacting with, ordelivering to, a somatic cell an effective amount of a TET familyinhibitor.120. The use of paragraph 119, wherein the TET family inhibitor is aTET3 inhibitor.121. The use of paragraph 138, wherein the adult somatic cell is afibroblast.122. The use of at least one catalytically active TET family enzyme,functional TET family derivative, TET catalytically active fragment, orcombination thereof, in the manufacture of a medicament for improvingefficiency of cloning mammals by nuclear transfer or nucleartransplantation, wherein an effective 5-methylcytosine to5-hydroxymethylcytosine hydroxylating amount of at least onecatalytically active TET family enzyme, functional TET familyderivative, TET catalytically active fragment, or combination thereof,is contacted with a nucleus extracted from a cell to be cloned during anuclear transfer protocol.123. The use of paragraph 122, further comprising contacting a nucleusextracted from a cell to be cloned during a nuclear transfer protocolwith an effective amount of a TET family inhibitor.124. The use of paragraph 123, wherein the TET family inhibitor is aTET3 inhibitor. 125. The use of a detectably labeled antibody or aantigen-binding portion thereof, a labeled intrabody, or a labeledprotein, that specifically binds to 5-hydroxymethylcytosine fordetecting a 5-hydroxymethylcytosine nucleotide in a sample, wherein thepresence of the bound label is indicative of the presence of5-hydroxymethylcytosine in the sample.126. The use of at least one catalytically active TET family enzyme,functional TET family derivative, TET catalytically active fragment, orcombination thereof, or at least one nucleic acid molecule encoding atleast one catalytically active TET family enzyme, functional TET familyderivative, TET catalytically active fragment, or combination thereof,in the manufacture of a medicament for improving stem cell therapies,wherein an effective amount of at least one catalytically active TETfamily enzyme, functional TET family derivative, TET catalyticallyactive fragment, or combination thereof, or at least one nucleic acidmolecule encoding at least one catalytically active TET family enzyme,functional TET family derivative, TET catalytically active fragment, orcombination thereof, is contacted with, or delivered to, a stem cell forimproving stem cell therapies.127. The use of an agent that specifically modulates hydroxylaseactivity of a at least one catalytically active TET family enzyme,functional TET family derivative, TET catalytically active fragment, orcombination thereof, involved in transforming 5-methylcytosine into5-hydroxymethylcytosine in the manufacture of a medicament for treatingan individual with or at risk for cancer.128. The use of paragraph 127, wherein the agent that specificallymodulates hydroxylase activity is an inhibitor.129. The use of paragraph 127, wherein the agent that specificallymodulates hydroxylase activity is an activator.130. The use of paragraph 127, wherein the cancer is a myelodysplasticsyndrome, a myeloproliferative disorder, acute myelogenous leukemia,systemic mastocytosis, or chronic myelomonocytic leukemia131. The use of paragraph 127, wherein the cancer is a leukemia.132. The use of paragraph 131, wherein the leukemia is an acute myeloidleukemia comprising the t(10:11)(q22:q23) Mixed Lineage Leukemiatranslocation of TET1.133. The use as in any one of paragraphs 112, 116, 122, 126, or 127,wherein the catalytically active TET family enzyme is selected from thegroup consisting of TET1, TET2, TET3, and CXXC4.134. The use as in any one of paragraphs 112, 116, 122, 126, or 127,wherein the functional TET family derivative comprises SEQ ID NO: 1.135. The use as in any one of paragraphs 112, 116, 122, 126, or 127,wherein the TET family catalytically active fragment comprises SEQ IDNO: 2, SEQ ID NO: 3, SEQ ID NO: 4, or SEQ ID NO: 5.136. The use of an enzyme that adds one or more glucose molecules to a5-hydroxymethylcytosine residue in a nucleic acid for covalent tagging5-hydroxymethylcytosine to generate glucosylated-5-hydroxymethylcytosineor gentibiose-containing-5-hydroxymethylcytosine, wherein the enzyme isan alpha-glucosyltransferase, a beta-glucosyltransferase, or abeta-glucosyl-alpha-glucosyl-transferase.137. The use of paragraph 136, wherein the 5-hydroxymethylcytosine isnaturally occurring.138. The use of paragraph 136, further comprising the step of firstcontacting said nucleic acid with a at least one catalytically activeTET family enzyme, functional TET family derivative, TET catalyticallyactive fragment, or combination thereof, thereby converting5-methylcytosine to hydroxymethylcytosine.139. The use of an enzyme that utilizes labeled glucose orglucose-derivative donor substrates to add one or more labeled glucosemolecules or glucose-derivatives to a 5-hydroxymethylcytosine residue ina nucleic acid to generate glucosylated-5-hydroxymethylcytosine orgentibiose-containing-5-hydroxymethylcytosine for detecting5-hydroxymethylcytosine, wherein the enzyme is analpha-glucosyltransferase, a beta-glucosyltransferase, or abeta-glucosyl-alpha-glucosyl-transferase.140. The use of paragraph 139, wherein the glucose or glucose-derivativedonor substrate is a uridine diphosphate glucose.141. The use of paragraph 139, wherein the labeled glucose orglucose-derivative donor substrate is radioactively labeled.142. The use of paragraph 141, wherein the radioactive label is ¹⁴C or³H.143. The use of paragraph 139, wherein the 5-hydroxymethylcytosine isnaturally occurring.144. The use of paragraph 139, further comprising the step of firstcontacting said nucleic acid with at least one catalytically active TETfamily enzyme, functional TET family derivative, TET catalyticallyactive fragment, or combination thereof, thereby converting5-methylcytosine to 5-hydroxymethylcytosine.145. The use of a protein that recognizes a glucose molecule,glucose-derivative or gentibiosyl molecule for detecting the covalentlytagged 5-hydroxymethylcytosine of paragraph 136.146. The use of paragraph 145, wherein the protein recognizes only theglucose molecule, glucose-derivative, or gentibiosyl.147. The use of paragraph 145, wherein the protein recognizes theglucose molecule, glucose-derivative, or gentibiosyl only in the contextof 5-hydroxymethylcytosine.148. The use of paragraph 145, wherein the protein is a lectin.149. The use of paragraph 148, wherein the lectin is Musa acuminatalectin.150. The use of paragraph 145, wherein the protein is a antibody orantibody fragment thereof.151. The use of paragraph 150, wherein the antibody or antibody fragmentthereof is modified with a tag.152. The use of paragraph 170, wherein the tag is a biotin molecule, abead, a gold particle, or a fluorescent molecule.153. The use of paragraph 145, wherein the protein is an enzyme.154. The use of paragraph 153, wherein the enzyme is a hexokinase orbeta-glucosyl-alpha-glucosyl-transferase.155. The use of an enzyme and a glucose or glucose-derivative donorsubstrate for trapping covalent enzyme-DNA intermediates to detect a5-hydroxymethylcytosine residue in a nucleic acid, wherein the enzyme isan alpha-glucosyltransferase, a beta-glucosyltransferase, or abeta-glucosyl-alpha-glucosyl-transferase.156. The use of paragraph 155, wherein the glucose donor substrate is auridine diphosphate glucose analog.157. The use of paragraph 156, wherein the uridine diphosphate glucoseanalog is uridine-2-deoxy-2-fluoro-glucose.158. The use of paragraph 155, wherein the 5-hydroxymethylcytosine isnaturally occurring.159. The use of paragraph 155, further comprising the step of firstcontacting said nucleic acid with at least one catalytically active TETfamily enzyme, functional TET family derivative, TET catalyticallyactive fragment, or combination thereof, thereby converting5-methylcytosine to 5-hydroxymethylcytosine.160. The use of paragraph 155, wherein the enzyme is tagged.161. The use of an assay to detect 5-hydroxymethylcytosine in a nucleicacid, the assay comprising contacting a nucleic acid with sodiumhydrogen sulfite to convert a 5-hydroxymethylcytosine residue in thenucleic acid to cytosine-5-methylsulfonate, and contacting the sodiumhydrogen sulfite contacted nucleic acid with a protein specific forcytosine-5-methylsulfonate.162. The use of paragraph 161, wherein the protein is an antibody orantigen-binding fragment thereof, an enzyme, or an intrabody.163. The use of paragraph 162, wherein the antibody comprises anantiserum.164. The use of paragraph 162, wherein the antibody or antigen-bindingfragment thereof, enzyme, or intrabody is modified with a tag.165. The use of paragraph 164, wherein the tag is a biotin molecule, abead, a gold particle, or a fluorescent molecule.166. The use as in any one of paragraphs 136, 139, or 155, wherein thealpha-glucosyltransferase is encoded by a bacteriophage selected fromthe group consisting of T2, T4, and T6 bacteriophages.167. The use as in any one of paragraphs 136, 139, or 155, wherein thebeta-glucosyltransferase is encoded by a bacteriophage selected from T4bacteriophages.168. The use as in any one of paragraphs 136, 139, or 155, wherein thebeta-glucosyl-alpha-glucosyl-transferase is encoded by a bacteriophageselected from the group consisting of T2 and T6 bacteriophages.169. The use as in any one of paragraphs 136, 139, 155, or 161, whereinthe nucleic acid is contacted in vitro, in a cell, or in vivo.

REFERENCES

The references cited herein and throughout the specification andexamples are herein incorporated by reference in their entirety.

-   1. R. B. Lorsbach et al., Leukemia 17, 637 (March, 2003).-   2. R. Ono et al., Cancer Res 62, 4075 (Jul. 15, 2002).-   3. F. Delhommeau et al., Blood 112, lba-3 (November, 2008).-   4. F. Viguie et al., Leukemia 19, 1411 (August, 2005).-   5. C. Bogani et al., Stem Cells 26, 1920 (August, 2008).-   6. G. Leone, M. T. Voso, L. Teofili, M. Lubbert, Clin Immunol 109,    89 (October, 2003).-   7. L. Teofili et al., Int J Cancer 123, 1586 (Oct. 1, 2008).-   8. S. R. Kornberg, S. B. Zimmerman, A. Kornberg, J Biol Chem 236,    1487 (May, 1961).-   9. M. Winkler, W. Ruger, Nucleic Acids Res 21, 1500 (Mar. 25, 1993).-   10. S. Kuno, I. R. Lehman, J Biol Chem 237, 1266 (April, 1962).-   11. H. Hayatsu, M. Shiragami, Biochemistry 18, 632 (Feb. 20, 1979).-   12. D. Zilberman, S. Henikoff, Development 134, 3959 (November,    2007).-   13. L. Lariviere, N. Sommer, S. Morera, J Mol Biol 352, 139 (Sep. 9,    2005).-   14. L. Lariviere, V. Gueguen-Chaignon, S. Morera, J Mol Biol 330,    1077 (Jul. 25, 2003).-   15. J. Wicki, D. R. Rose, S. G. Withers, Methods Enzymol 354, 84    (2002).-   16. I. J. Goldstein et al., Eur J Biochem 268, 2616 (May, 2001).

We claim:
 1. A method comprising: labeling a hydroxyl group on ahydroxymethylated residue in a nucleic acid to generate a labeledhydroxymethylated residue, wherein said nucleic acid is from anextracellular fluid sample; and sequencing said nucleic acid comprisingsaid labeled hydroxymethylated residue.
 2. The method of claim 1,wherein said extracellular fluid sample is from a mammal.
 3. The methodof claim 1, wherein said nucleic acid is a mammalian nucleic acid. 4.The method of claim 1, wherein said labeling is covalently labeling. 5.The method of claim 1, wherein said hydroxymethylated residue is a5-hydroxymethylcytosine.
 6. The method of claim 5, wherein said labelingcomprises glycosylating said 5-hydroxymethylcytosine.
 7. The method ofclaim 1, wherein said nucleic acid further comprises a methylatedcytosine residue.
 8. The method of claim 7, wherein said methylatedcytosine residue is a 5-methylcytosine.
 9. The method of claim 1,wherein said sequencing comprises high-throughput sequencing.
 10. Themethod of claim 1, further comprising binding said labeledhydroxymethylated residue to a support.
 11. The method of claim 10,wherein said binding occurs prior to said sequencing.
 12. The method ofclaim 1, wherein said labelling comprises associating a label with saidhydroxymethylated residue.
 13. The method of claim 12, wherein saidlabel comprises a sugar.
 14. The method of claim 12, wherein said labelcomprises a bead.
 15. A composition comprising a nucleic acid from anextracellular fluid sample, wherein said nucleic acid comprises acovalently labeled hydroxyl group on a hydroxymethylated residue. 16.The composition of claim 15, wherein said extracellular fluid sample isfrom a mammal.
 17. The composition of claim 15, wherein saidhydroxymethylated residue is a 5-hydroxymethylcytosine.
 18. Thecomposition of claim 15, wherein said extracellular fluid sample isisolated from a subject.
 19. The composition of claim 15, wherein alabel that is covalently associated with said hydroxyl group comprises asugar.
 20. The composition of claim 19, wherein said sugar comprises amodified glucose.